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PREFACE 


The  purpose  of  this  WSDG  Technical  Paper  is  to  review  several 
statistical  methods  commonly  used  in  water  quality  data  analysis.  It  is 
not  intended  to  be  another  detailed  statistical  text,  but  rather,  provide 
Forest  Hydrologists  with  a concise  review  of  basic  statistics.  Throughout 
the  paper  the  assumptions  underlying  the  various  statistical  tests  have 
been  emphasized.  In  addition,  water  quality  examples  have  been  included 
throughout.  Although  it  is  anticipated  that  you  will  perform  most  of  your 
statistical  analyses  on  programmable  calculators  or  computers,  the  basic 
formula  and  procedures  involved  in  the  various  statistical  tests  have  been 
included  to  help  give  you  a better  perspective  of  what  is  going  on  in  the 
"canned"  programs. 

This  Technical  Paper  was  designed  to  be  used  in  conjunction  with  WSDG 
Technical  Paper  00002,  "Water  Quality  Monitoring  Programs,"  and  WSDG 
Application  Documents  00001  and  00002,  Statistical  Analysis  Using  SAS  at 
the  USEPA  National  Computer  Center"  and  "Statistical  Analysis  Using  SPSS  at 
the  USDA  Fort  Collins  Computer  Center,"  respectively.  Together,  these  two 
Technical  Papers  and  two  Support  Documents “form  the  "Water  Quality 
Handbook"  scheduled  for  release  late  winter  of  1981  by  the  USDA  Forest 
Service. 

I would  like  to  acknowledge  all  the  people  that  contributed  to  the 
development  of  this  Technical  Paper.  A very  special  thanks  is  given  to 
Dave  Ryn,  Computer  Specialist  and  Statistician,  WSDG;  Walt  Hivner, 
Statistician,  Colorado  State  University,  and  Bob  Averett,  Hydrologist, 
USGS-WRD,  who  performed  the  detailed  technical  review.  I would  like  to 
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extend  my  appreciation  to  Jim  Ingwersen,  Cindi  Eichin*  Gordon  Snyder,  Eric 
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Si  verts,  and  all  the  Forest  Hydrologists  who  provided  many  helpful 
suggestions  concerning  content  and  format. 
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STATISTICAL  METHODS  COMMONLY  USED  IN 


WATER  QUALITY  DATA  ANALYSIS 


1.0  Introduction 

The  proper  use  of  statistical  methods  in  the  reduction  and  analysis  of 
water  quality  data  is  very  important.  These  methods  allow  us  to  take  an 
objective  view  of  the  data  and,  hopefully,  keep  us  from  making  fools  of 
ourselves  by  claiming  that  a favorite  theory  is  substantiated  by 
observations  that  do  nothing  of  the  sort  (Colquhoun,  1971).  This  is  not  to 
imply  that  field  observations  are  not  important  in  the  interpretive 
process,  for  indeed  they  are.  However,  these  observations  should  be 
tempered  with  the  objectivity  of  statistical  methods.  Averett  (1979) 
states 

"data  i nterpretation  is  an  intellectual  activity;  statistical 
application  is  a mechanical  activity.  Good  experimental  design 
eases  the  i nterpretation  burden,  and  statistical  methods  are  best 
applied  in  evaluating  the  adequacy  or  applicability  of  the 
selected  experimental  design.  Statistical  methods  are, 
therefore,  extremely  useful  tools  in  the  reduction  and  analysis 
of  data,  and  as  such,  can  be  used  as  an  aid  in  data 
i nterpretation.  The  statistical  testing  of  data  can  provide 
'yes'  or  'no'  answers  at  a given  probability  1 evel --nothi ng 
more." 


The  purpose  of  this  WSDG  Technical  Paper  is  to  review  several  selected 
statistical  methods.  It  is  not  intended  to  be  another  detailed  statistical 
text.  If  you  feel  you  need  more  information  concerning  a specific 
statistical  method,  two  texts  recommended  are  B i omet ry  by  Sokal  and  Rohlf 
(1969)  and  Statistical  Analysis  of  Samples  of  Benthic  Invertebrates  by 
Elliott  (1977). 
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This  paper  begins  with  a series  of  definitions  of  common  statistical 
terms  followed  by  (1)  a brief  discussion  of  the  underlying  assumptions  and 
data  requirements  for  parametric  statistical  testing,  (2)  descriptive 
statistics,  (3)  important  theoretical  probability  density  functions,  (4) 
confidence  limits  about  the  mean,  (5)  hypothesis  testing,  and  (6)  testing 
the  homogeneity  of  the  variance.  It  is  important  that  you  understand  these 
concepts  before  proceeding,  for  they  are  fundamental  to  an  understanding  of 
the  subsequent  statistical  methods.  Next,  the  powerful  tools  of  (1) 
analysis  of  variance  and  (2)  regression  and  correlation  are  covered. 
Throughout  this  paper,  emphasis  has  been  placed  on  the  assumptions 
underlying  the  various  methods  and  the  utility  of  these  methods  for  the 
analysis  of  water  quality  data. 

To  enhance  your  understanding  of  the  various  statistical  methods 
presented,  water  quality-related  examples  have  been  included  throughout.  It 
is  realized  that  you  will  be  doing  most  of  your  data  analysis  using 
statistical  packages  designed  for  hand-held  calculators  or  computers. 
However,  it  is  important  that  you  clearly  understand  what  the  "canned" 
programs  are  doing.  Therefore,  I have  outlined  the  basic  "number 
crunching"  operations  associated  with  each  method.  The  statistical  package 
that  you  use  for  your  data  analysis  is  generally  a function  of 
availability.  Two  packages  which  are  readily  available  to  Forest 
Hydrologists  are  SAS  (Statistical  Analysis  System,  1979)  and  SPSS 
(Statistical  Package  for  Social  Sciences,  1975).  You  can  address  SAS 
through  the  EPA  NCC-IBM  (where  STORET  data  is  retained),  and  SPSS  through 
the  USDA  Fort  Collins  Computer  Center.  Many  of  the  examples  have  been 
solved  using  SAS  and  SPSS.  A detailed  discussion  of  the  procedures 
necessary  to  address  the  various  statistical  methods  using  SAS  and  SPSS  is 
presented  by  Ingwersen  (1981a  and  1981b). 
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2.0  Basic  Definitions 


It  seems  that  any  discussion  of  statistics  always  begins  with  a series 
of  definitions.  It  helps  to  get  everyone  into  the  same  framework  for 
statistical  thinking  and,  as  such,  is  necessary.  What  follows  is  a brief 
journey  through  some  of  the  basic  definitions  fundamental  to  the  science  of 
statistics.  Only  the  ones  necessary  to  get  us  on  our  way  are  presented 
here.  As  we  proceed  into  the  maze  of  statistical  methods  and  concepts, 
this  list  of  definitions  will  grow. 

In  a narrow  sense,  statistics  refers  to  any  of  the  various  computed  or 
estimated  statistical  quantities,  such  as  the  mean,  standard  deviation,  or 
variance.  In  a broad  sense,  it  can  be  thought  of  as  the  scientific  study 
of  numerical  data  from  natural  phenomena,  concerned  with  making  inferences 
about  a popul ati on  based  upon  information  on  a sample  taken  from  the 
population. 

A collection  of  individual  observations  obtained  by  a specified 
procedure,  such  as  random  sampling,  is  commonly  referred  to  as  a sample  or 
a sample  of  observations.  If  the  sample  has  been  collected  properly,  it 
can  be  considered  to  be  representati ve  of  a population  containing  all 
possible  observations.  The  specific  property  measured  by  the  individual 
observations  is  called  the  variable.  Variables  are  subdivided  into  several 
categories,  two  of  which  concern  us.  Continuous  variables,  such  as  calcium 
or  total  dissolved  solids  concentrations,  theoretically  can  assume  an 
infinite  number  of  values  between  any  two  fixed  points  and  are  limited  only 
by  the  precision  of  the  instrument  used  to  make  the  measurement. 

Pi sconti nuous  or  discrete  variables,  such  as  fecal  coliform  or  stonefly 
nymph  counts,  are  those  that  have  only  certain  fixed  numerical  values  with 
no  intermediate  values  possible  inbetween.  Usually,  discrete  data  take  on 
integral  numerical  values,  such  as  1,  2,  3,  100,  or  200. 
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3.0  Underlying  Assumptions  and  Data  Requirements  for  Parametric 
Statistical  Testing 

Most  water  quality  data  analysis  performed  by  Forest  Hydrologists  will 
employ  parametric  statistical  methods.  These  methods  are  the  ones  commonly 
taught  at  the  undergraduate  level  in  college  and  are  characterized  by  using 
tests  whose  models  specify  conditions  about  the  parameters  of  the 
population  from  which  the  sample  was  drawn.  There  are  several  basic 
assumptions  underlying  parametric  statistical  tests  of  which  you  should  be 
aware  whenever  you  apply  these  procedures.  These  are: 

1.  the  observations  are  independent; 

2.  the  distribution  of  the  population  is  known; 

3.  the  variances  of  the  populations  being  compared  are  equal  or  of 
known  ratio. 

In  addition,  these  statistical  tests  require  that  the  data: 

1.  were  collected  in  a random  manner; 

2.  have  error  variation  independent  of  the  means,  normally 
distributed  and  homogeneous;  and 

3.  have  variance  components  which  are  additive. 

The  data  commonly  encountered  in  water  quality  studies  rarely  satisfy 
all  the  assumptions  and  requirements  of  parametric  statistical  methods. 
However,  as  Glass  and  others  (1972)  state,  the  relevant  question  is  not 
whether  the  assumptions  and  requirements  are  met  exactly,  but  rather 
whether  the  plausible  violations  of  the  assumptions  and  requirements  have 
serious  consequences  on  the  validity  of  the  probability  statements  based 
upon  them.  At  this  point,  it  is  only  important  that  you  keep  these 
assumptions  and  requirements  in  mind.  As  we  develop  our  statistical 
arsenal,  I will  illustrate  procedures  which  can  be  applied  to  the  raw  data 
to  test  if  the  requirements  are  met  and  to  point  out  the  pitfalls  you  may 
encounter  in  your  inferences  when  these  assumptions  and  requirements  are 
violated. 
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4.0  Descriptive  Statistics 


Data  analysis  generally  begins  with  numerical  summary  or  description 
of  the  data  set.  Two  types  of  descriptive  statistics  are  addressed  here: 
statistics  of  central  tendency  and  statistics  of  dispersion. 

The  statistics  of  central  tendency  or  location  include  the  mean , 
medi an,  and  mode.  The  mean  is  the  one  we  are  most  familiar  with  and  is  a 
statistic  of  great  importance  since  many  statistical  tests  center  around 
comparison  of  the  sample  means.  The  arithmetic  mean  (X)  is  calculated  by 
summing  all  the  individual  observations  (EX  -j ) of  a sample  and  dividing  it 
by  the  number  of  observations  (n)  in  the  sample  (Equation  1). 

(1) 

As  you  recall,  sample  statistics  are  designed  to  be  estimators  of 
population  parameters.  In  this  case  X is  an  estimate  of  y,  the  population 
mean.  (Note:  As  a matter  of  convention  in  statistics,  Greek  letters  are 

used  to  denote  population  parameters  while  Roman  letters  are  used  to  denote 
sample  statistics.) 

The  median  divides  a frequency  distribution  into  two  halves  and  is 
defined  as  that  value  of  a variable  (in  an  array  ordered  from  the  smallest 
to  the  largest)  that  has  an  equal  number  of  observations  on  either  side  of 
it.  The  median  is  easily  determined  when  the  sample  array  has  an  odd 
number  of  observations.  However,  when  the  number  of  observations  is  even, 
it  is  determined  by  calculating  the  mean  of  the  (n/2)th  and  [ ( n/2 ) + 1 ]t h 
observations.  In  certain  cases,  when  the  distribution  is  asymmetric,  as  is 
generally  the  case  with  coliform  data,  the  median  is  a more  representati ve 
measure  of  location  than  the  arithmetic  mean. 


5 


The  mode  is  defined  as  the  value  or  class  interval  having  the  greatest 
number  of  individuals.  In  some  cases,  a mode  will  not  exist,  such  as  when 
several  values  or  frequency  classes  have  the  same  number  of  observations. 

Table  1 summarizes  the  characteristics , advantages  and  disadvantages 
of  the  arithmetic  mean,  median  and  mode.  In  symmetrical  distributions, 
such  as  the  normal  distribution,  the  mean,  median  and  mode  are  all 
identical.  In  asymmetrical  distributions  the  mean  is  relatively  closer  to 
the  drawn-out  tail  of  the  distribution,  while  the  mode  is  farthest  from  the 
mean  and  the  median  lies  between  the  two  (Figure  1). 

The  statistics  of  dispersion  include  the  range,  vari ance  and  standard 
deviation.  The  range  is  simply  the  difference  between  the  largest  and 
smallest  observations  in  a sample.  Since  the  range  can  be  affected  greatly 
by  an  extreme  value,  it  should  be  considered  only  as  a rough  estimate  of 
the  dispersion  of  all  the  observations  in  a sample. 

Both  the  variance  and  standard  deviation  take  all  observations  of  a 
sample  into  consideration  and  weigh  each  observation  by  its  distance 
(deviation)  from  the  calculated  center  (mean)  of  the  distribution.  The 
deviation  (x-j)  of  the  ith  observation  (Xj)  from  the  mean  (X)  can 
be  expressed  as 

xi  = X;-X 

The  variance  (s2)  is  a modified  average  of  the  sum  of  the  deviations 
squared  ( ZX -j ^ ) (Equation  2).  The  variance  is  expressed  in  squared 
units. 


s2  = 


Ex  j2 

n - 1 


(2) 
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Table  1.  A summary  of  the  characteristics,  advantages  and  disadvantages  of  the  arithmetic  mean,  median  and  mode. 
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Mean 


Figure  1.  Relative  position  of  the  mean, 
median  and  mode  for  an  asymmetrical  distribution. 

The  sum  of  the  deviations  is  divided  by  "n-1"  in  order  to  obtain  an 
unbiased  estimate  of  the  population  variance.  Statisticians  have  shown 
that  when  the  sums  of  squares  (the  deviations  squared)  are  divided  by  "n-1" 
rather  than  "n"  a more  accurate  estimate  of  the  population  variance  will  be 
obtained,  no  matter  what  the  sample  size.  Calculations  that  divide  the  sum 
of  squares  by  "n"  tend  to  underestimate  the  variance,  especially  when  "n" 
is  small,  and  result  in  a biased  estimate  of  the  population  variance. 
Consequently,  you  should  always  use  "n-1"  in  your  calculation  of  the 
estimate  of  the  population  variance.  The  quantity  "n-1"  is  generally 
referred  to  as  the  degrees  of  freedom  (df).  As  you  will  see  later,  the 
degrees  of  freedom  are  used  with  many  other  statistics  besides  the 
vari ance. 

The  standard  deviation  (s)  is  calculated  by  taking  the  square  root  of 
the  variance  (Equation  3). 
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(3) 


/Ex2 

s = Vfrn 

The  advantage  of  the  standard  deviation  (s)  over  the  variance  (s^)  is 
that  the  units  are  not  squared  and,  therefore,  tend  to  be  more  meaningful. 

Sometimes  we  wish  to  compare  the  amount  of  variation  in  populations 
having  different  means.  To  accomplish  this  we  use  the  coefficient  of 
vari ation  (CV)  which  is  simply  the  standard  deviation  expressed  as  a 

percentage  of  the  mean  (Equation  4). 

cv=s(100)  (4) 

X 

A summary  of  the  characteristics,  advantages  and  disadvantages  of  the 
range,  standard  deviation  and  coefficient  of  variation  is  presented  in 
Table  2. 

Use  of  the  descriptive  statistics  described  in  this  section  is 
illustrated  in  Examples  la  and  lb.  The  examples  utilize  a population  of 
specific  conductance  values  and  fecal  col i form  counts,  ranked  in  order  of 
increasing  magnitude  (Table  3). 
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Table  2.  A summary  of  the  characteristics,  advantages  and  disadvantages  of  the  range,  standard 
deviation  and  coefficient  of  variation. 
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Table  3.  Populations  of  specific  conductance  (SC)  and  fecal  coliform  (FC); 
ranked  in  order  of  increasing  magnitude,  with  means  (P)  of  81.5  ymhos/cm 
and  23.6  counts/100  ml  and  standard  deviations  (a)  of  12.1  Umhos/cm  and 
12.5  counts/100  ml,  respectively. 


Rank 

SC 

FC 

1 

51 

7 

2 

56 

8 

3 

58 

9 

4 

59 

10 

5 

61 

10 

6 

62 

11 

7 

63 

11 

8 

64 

11 

9 

65 

12 

10 

66 

12 

Rank 

SC 

FC 

51 

82 

21 

52 

82 

21 

53 

82 

21 

54 

82 

21 

55 

83 

22 

56 

83 

22 

57 

83 

22 

58 

83 

22 

59 

83 

23 

60 

84 

23 

Rank 

SC 

FC 

11 

67 

12 

12 

67 

13 

13 

68 

13 

14 

68 

13 

15 

68 

13 

16 

69 

14 

17 

69 

14 

18 

70 

14 

19 

71 

14 

20 

72 

14 

Rank 

SC 

FC 

61 

84 

23 

62 

85 

23 

63 

85 

24 

64 

86 

24 

65 

86 

24 

66 

86 

24 

67 

86 

25 

68 

87 

25 

69 

87 

25 

70 

87 

26 

Rank 

SC 

FC 

21 

72 

15 

22 

72 

15 

23 

73 

15 

24 

73 

15 

25 

74 

15 

26 

74 

16 

27 

74 

16 

28 

74 

16 

29 

75 

16 

30 

75 

16 

Rank 

SC 

FC 

71 

88 

26 

72 

88 

26 

73 

88 

27 

74 

89 

27 

75 

89 

27 

76 

89 

28 

77 

90 

28 

78 

90 

28 

79 

91 

29 

80 

92 

29 

Rank 

SC 

FC 

31 

75 

16 

32 

75 

17 

33 

76 

17 

34 

77 

17 

35 

77 

17 

36 

78 

17 

37 

78 

18 

38 

78 

18 

39 

78 

18 

40 

78 

18 

Rank 

SC 

FC 

81 

92 

29 

82 

92 

30 

83 

93 

30 

84 

93 

31 

85 

94 

31 

86 

94 

32 

87 

95 

32 

88 

95 

33 

89 

96 

37 

90 

97 

38 

Rank 

SC 

FC 

41 

79 

18 

42 

79 

19 

43 

79 

19 

44 

79 

19 

45 

80 

19 

46 

80 

19 

47 

80 

20 

48 

81 

20 

49 

81 

20 

50 

82 

20 

Rank 

SC 

FC 

91 

98 

40 

92 

99 

44 

93 

100 

45 

94 

100 

46 

95 

101 

49 

96 

103 

54 

97 

105 

59 

98 

107 

61 

99 

108 

67 

100 

111 

75 
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EXAMPLE  la 

Descriptive  Statistics 


Problem: 

Obtain  two  random  samples,  containing  15  observations  each,  from  the 
specific  conductance  and  fecal  coliform  populations  tabulated  in  Table  3. 
For  each  sample  determine  the  (a)  mean,  (b)  standard  deviation,  and  (c) 
coefficient  of  variation. 

Sol ution: 

The  samples  were  obtained  with  the  aid  of  a random  number  (RN)  table. 


RN 

sc. 

FC 

RN 

SC 

FC 

RN 

SC 

FC 

59 

83 

23 

38 

78 

18 

24 

73 

15 

34 

77 

17 

9 

65 

12 

27 

74 

16 

46 

80 

19 

95 

101 

49 

65 

86 

24 

81 

92 

29 

60 

84 

23 

48 

81 

20 

55 

83 

22 

78 

90 

28 

35 

77 

17 

(a)  Mean 

SC  = ESC,/n  = 1224/15  - 81.6/tmhos/cm 
FC  = LFC,/n  = 33l/l5  = 22.1  counts/lOOml 

(b)  Standard  Deviation 


S(SC)  = J fc^  = 8.7  ^mhos/cm 
s(FC)  = VMfpcr  _ 8.8counts/l00ml 
(c)  Coefficient  of  Variation 


CV  (sc)  — 


Ssc_x_100 

SC 


10.6% 


CV(FC)  — 


sFCx  100 
FC 


39.8% 


12 


The  UNIVARIATE  procedure  from  SAS  (1979)  or  the  CONDESCRIPTIVE 
subprogram  from  SPSS  (1975)  could  be  used  to  solve  this  example  problem. 

The  UNIVARIATE  procedure  was  used  and  the  results  are  presented  in  Tables  c 
and  5. 


13 


X-VGNOri-et:  E i 


a s 


I 

dn 

i 

in 


1 


a 

43 

> 

z 


► • 4-sJO  (V^4 

4)  X CO  O'  O'-  O 


Uj 

(Z 

H- 

E 


it.  to  — » o tr>  ifi 
(V  O'  v£) 


O'  ^ 
O'  O' 


O'  in  o O I/:  H 


i $ 


O'  O ■ 


•H  IT  IT  ITi 
* |*  CVj 
^ O • 
XXX 


T — 

« r ii  »h 

5 c x cr 

Q IT  IT  t? 
!P  MT  ^ 


vC  IT 
n (V 


li  r- 

C C l- 

z 

C D C 

a c r 


f'  ^ -3-  sP  <*  H-  IT 
t-  (V  O'  •'COO' 
C\J  rf  -«  O LD  t> 


^•d'H  5< 

I*  c c (v  y 
‘ - 


ST  • T 


1. 


■ 1 


< t— 


in 

O — 

z in  ; u — 3 

< O X a v 

*-  *-  I XX 

x o:  a it,  c O c 

^ r if-  w-  a a: 

/,  > il  U l/l  1 Q. 


I 


i i 

O'  v£  vC  M ^ N 

— • x o eg  o -c 

H ir,  I O'  X O X 

x x>  a-  o • <f  x 

• o o o •>- 

X >t  »-  — * X o 

® m ® 


cn 

! > cn 

U.  b_ 

C Z 
Z 3 

<Cxj 

_d  ^ i/)  > — •• 

Z t l/J  71  i/)U  t-  3 


O _I 

II  < 

z z 
< a 

u_  C 
X z 


c 

a 

I 

CD 

O 

CL 


a * 

X 

O 

ct 


I >c 

t -t 

u.  rn  rv 

< m H- 

oj  ru—4- 
-J  — « c o rp  , 

X o O'  x 4* 


* ~ 

I 8 

0 


CD 


< I 


I 
8 
I 

y - 


t x 

8 UJ 
I r- 

♦ X 

I 

I >* 


at  © 

05  3 • 
V-  Q O 

si= 

oc  Jj  r*- 

uj  Oi  • 
0-  Li-  X 

1. 


x n r> 
tn  z> 
f—  o rn  o x r> 
2 I h cp  COC' 
Ll. 

e 

CL 

UJ  ZJ  • • • ► 

CL  UJ  X X X 

CJ 


[L  < sC  <D  (il 

X X X 0'(' 


I c n n r* 
O)  z 
i-  C c >t  r- 
Z ! -£  IT  Vf 

wU 

C . 

o:  Jjnf 

L L Jj  • 

L uJ J)  sO  r) 


r-  rf> 


2M|lor: 

in  3 • * • • 

h g c (l>c  n 

z oj  rp 

> • i 


a 


r-  r*-  r-  rp 

Lsj  Jj  • • 

fl.  LL  >C  'i  >C  ft) 

C 

I i 

H-  *t«  •— • r j 

z 1 

c i 

^ m n <r 


14 


Table  4.  SAS  output  for  the  specific  conductance  portion  of  Part  2,  Example  la. 
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Table  5.  SAS  output  for  the  fecal  col i form  portion  of  Part  2,  Example  la. 


EXAMPLE  lb 

Descriptive  Statistics 


Problem: 

For  the  specific  conductance  and  fecal  coliform  populations  tabulated 
in  Table  3 determine  the  (a)  median,  (b)  mode,  and  (c)  range. 

Sol ution: 


(a)  Medi an 

The  median  for  the  specific  conductance  population  is  the  mean  of 

ranks  50  and  51,  which  is  82  ymhos/cm. 

The  median  for  the  fecal  coliform  population  is  20.5  counts/100  ml. 

(b)  Mode 

The  specific  conductance  population  does  not  have  a mode;  82  and  83 

are  both  observed  five  times. 

The  mode  of  the  fecal  coliform  population  is  16  counts/100  ml. 

(c)  Range 

Specific  conductance:  111  - 51  = 60  pmhos/cm 

Fecal  coliform:  75  - 7 = 68  counts/100  ml. 

The  UNIVARIATE  procedure  from  SAS  (1979)  was  also  used  to  develop  the 
required  descriptive  statistics  (Tables  6 and  7).  Note  that  there  is  no 
data  entry  for  the  mode  of  the  specific  conductance  since  it  does  not 
exist.  The  CONDESCRIPTIVE  subprogram  from  SPSS  (1975)  could  also  have  been 
used,  however,  it  will  not  determine  the  median  and  mode. 
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Table  6.  SAS  output  for  the  specific  conductance  portion  of  Part  2,  Example  lb. 
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Table  7.  SAS  output  for  the  fecal  coliform  portion  of  Part  2,  Example  lb. 


5.0  Important  Theoretical  Probability  Density  Functions 


Statisticians  have  developed  several  theoretical  probability  density 
functions  which  can  be  used  as  models  for  samples  from  a population.  If 
the  frequency  distribution  of  observations  from  a sample  fits  one  of  these 
models,  then  (1)  errors  of  the  population  parameters  can  be  estimated,  (2) 
temporal  and  spatial  changes  in  frequency  can  be  compared  and  (3)  the 
effect  of  environmental  factors  and  management  practices  can  be  examined. 

The  models  commonly  applied  to  discontinuous  or  count  data  are  the 
Binomial  and  the  Poisson  distributions.  A detailed  discussion  and 
illustration  of  the  application  of  each  of  these  models  is  given  by  Elliott 
(1977).  The  most  important  theoretical  probability  density  functions  for 
continuous  or  measured  data  are  the  normal , _t,  chi -square  and  £ distri- 
butions. The  balance  of  this  Section  will  address  these  distributions.  It 
is  very  important  that  you  have  a clear  understanding  of  these 
distributions  since  many  of  the  procedures  that  follow  are  based  on  a 
knowledge  of  these  models. 


5.1  Normal  Distribution 

5.1.1  Introduction 

The  normal  distribution  is  also  referred  to  as  the  "Gausian" 
distribution.  We  are  all  familiar  with  the  symmetrical  bell-shaped  curve 
of  this  distribution.  However,  we  should  note  that  not  all  symmetrical 
distributions  are  normally  distributed  since  the  normal  distribution  is 
defined  by  a specific  equation  and  has  its  own  distinctive  set  of 
properties.  The  equation  for  the  normal  curve  is: 

Y = (5) 

where  Y is  the  height  of  the  curve  correspond!' ng  to  an  assigned  value  of  X 
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(which  represents  the  frequency  of  observations  of  a given  Xj);  tt  and 
e are  constants  nearly  equal  to  3.1416  and  2.7183,  respectively;  and  the 
other  parameters  are  as  previously  defined.  The  properties  of  normal 
distributions  are:  (1)  they  are  symmetrical  about  the  mean  (the  mean, 

median  and  mode  are  all  equal);  (2)  they  all  have  a concave  downward  trend 
when  X < +_  a of  the  mean  and  a concave  upward  trend  when  X > +_  a of  the 
mean;  and  (3)  the  proportion  of  the  area  between  two  X values  is  completely 
governed  by  y and  a. 

Generally,  we  are  interested  in  the  area  under  the  curve  since  we  can 
relate  it  to  the  probability  of  occurrence  of  an  interval  of  observations. 
(This  concept  of  probability  is  generally  denoted:  Pr  (argument)  = area.) 

The  area  contained  under  the  curve  between  two  observations,  such  as  Xa 
and  X^,  can  be  determined  by  using  Table  A-l  (found  in  Appendix  A). 

In  order  that  the  same  table  can  be  used  for  any  value  of  u and  o, 

which  vary  with  different  normal  populations,  Equation  5 was  standardized 

x-y 

by  substituting  z for  a prior  to  integration.  Table  A-l  is  entered  with 
the  value  of  z which  is  simply  the  deviation  of  X from  y measured  in  a 
units.  The  area  between  any  two  X-j's  can  be  found  by  using  the  symmetry 
of  the  curve  about  z = 0.  The  standardized  normal  curve  is  illustrated  in 
Figure  2.  It  is  apparent  that  68.3%  of  the  observations  lie  within  +_  a of 
the  mean,  95.5%  within  +_  2 a of  the  mean  and  99.7%  within  +_  3a  of  the  mean. 
Application  of  the  normal  distribution  is  illustrated  in  Example  2. 
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Figure  2.  The  standardized  normal  curve. 


The  relationship  between  z and  a is  illustrated  by  the  use  of 
two  ordinate  scales. 
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EXAMPLE  2 

Application  of  the  Normal  Curve 


Problem: 

For  the  specific  conductance  population  presented  in  Table  3,  which  is 
assumed  to  be  normally  distributed  with  mean  81.5  ymhos/cm  and  standard 
deviation  12.1  ymhos/cm,  determine  (a)  the  probability  of  an  observation 
falling  between  the  mean  and  91  ymhos/cm,  (b)  the  probability  of  a value 
being  less  than  87  ymhos/cm,  (c)  the  probability  of  a value  occurring 
between  77  and  94  ymhos/cm,  (d)  the  90%  value  of  X,  and  (e)  the  probability 
of  X-j  = 54  ymhos/cm. 

Solution: 

(a)  Probability  of  an  observation  falling  between  the  mean  and 
91  ymhos/cm. 

First,  it  is  necessary  to  convert  91  ymhos/cm  to  z units. 

z = = 91  ~ 81  5 = 0.79 

a 12.1 

Second,  determine  the  area  from  z = 0 to  z = 0.79  using  Table  A-l. 

Area  = 0.2852  (0.5000  - 0.2148  = 0.2852).  The  area  we  are  interested 
in  is  illustrated  below. 


Therefore,  the  Pr  (81.5  <.  X-j  _<  91)  = 0.2852 

(b)  Probability  of  a value  being  less  than  87  ymhos/cm. 

First,  convert  87  ymhos/cm  to  z units. 

z = 87  " 81-5  = 0.45 

12.1 


22 


Second,  determine  the  area  from  Table  A-l. 

Area  = 0.5  + 0.1736  =0.6736. 

The  area  we  are  interested  in  is  illustrated  below. 


Since  both  sides  of  the  curve  are  included  in  the  argument,  we 
must  add  0.5000  to  the  value  obtained  from  Table  A-l. 


Therefore,  P r ( X -j  j<  87 ) = 0.6736 


(c)  Probability  of  a value  occurring  between  77  and  94  ymhos/cm. 
In  this  case,  the  argument  falls  on  both  sides  of  the  mean. 


The  area  is  calculated  in  three  steps: 


First,  determine  the  z values. 


94  - 81.5 

12.1 

77  - 81.5 

12.1 


1.03 


-0.37 
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Second,  determine  the  area  and  apply  the  concept  of  symmetry  to  the 
curve  (Table  A-l).  The  area  to  the  right  of  the  center  is  equal  to 
0.3485  while  the  area  to  the  left  of  the  center  is  0.1443  (see  figure 
bel ow) . 


Third,  add  the  two  areas. 

Pr  (77  < Xi  < 94)  = 0.3485  + 0.1443  = 0.4928 
(d)  The  90%  value  of  X. 

To  determine  the  value  of  X-j  for  which  90%  (0.9000)  of  the 
observations  will  fall  below,  you  must  first  find  the  0.4000  value  in 
Table  A-l,  then  determine  z and  solve  for  X-j. 


The  0.4000  value  was  obtained  by  applying  the  concept  of  symmetry 
(0.4000  = 0.9000  - 0.5000). 

From  Table  A-l  we  find  that  z is  nearly  equal  to  1.28. 

Since:  z = (X  -p)/o 
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It  follows  that: 


X = 2a  + y 


Substituting  and  solving  we  get: 

X = 1.28  (12.1)  + 81.5 
X = 97.0  ymhos/cm 

(e)  Probability  of  X-j  = 54  ymhos/cm. 

To  apply  the  normal  distribution  to  discrete  data  it  is  necessary  to 
treat  the  data  as  if  it  were  continuous.  Thus,  the  value  54  is 
considered  as  53.5  to  54.5  ymhos/cm. 

Now  the  problem  becomes  similar  to  that  presented  in  item  (c). 
Pr(53.5  X < 54.5)  = 0.0025 
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5.1.2  Skewness  and  Kurtosis 


Many  times  an  observed  frequency  distribution  will  depart  markedly 
from  a normal  distribution.  There  are  two  types  of  departures  with  which 
we  need  to  be  familiar:  skewness  and  kurtosis. 

Skewness  refers  to  the  asymmetry  of  the  data  where  one  tail  is  drawn 
out  more  than  the  other.  A distribution  skewed  to  the  right  has  a long 
right  tail.  Kurtosis  refers  to  the  degree  of  peakedness  of  the  curve.  A 
distribution  that  has  more  observations  near  the  mean  and  at  the  tails  with 
fewer  observations  in  the  intermediate  regions  relative  to  the  normal  dis- 
tribution with  the  same  mean  and  variance  is  called  leptokurtic.  A 
pi atykurtic  curve  is  the  opposite  of  the  leptokurtic  curve  and  has  more 
observations  in  the  intermediate  regions  than  at  the  mean  or  in  the  tails 
relative  to  the  normal  distribution. 

In  general,  these  statistics  are  seldom  used  in  water  quality  data 
analysis.  However,  they  are  presented  here  simply  as  definitions  for  use 
in  future  discussions. 

5.1.3  Testing  the  Normality  of  the  Distribution 

As  you  recal 1 , one  of  the  assumptions  underlying  parametric  statistics 
is  that  the  distribution  of  the  population  is  known.  Many  water  quality 
problems  have  statistical  answers  based  on  the  assumption  that  the  dis- 
tribution of  the  population  is  normal.  There  are  several  methods  you  can 
use  to  check  this  assumption.  Three  procedures  are  presented  here:  (1) 

the  graphic  method,  (2)  the  Kolmogorov-Smirnov  test,  and  (3)  the 
Shapiro-Wilk  test.  (It  should  be  noted  that  most  statistics  books  suggest 
the  use  of  the  chi-square  test  for  goodness  of  fit.  However,  because  the 
Kolmogorov-Smirnov  and  Shapiro-Wilk  tests  are  more  powerful  and  are  the 
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ones  commonly  contained  in  computer  statistical  packages,  they  are  the  ones 
addressed  here  to  test  for  normality.) 

The  graphic  method  is  based  on  a cumulative  frequency  distribution. 
When  data  from  a normal  distribution  are  plotted  in  a cumulative  manner  on 
arithmetic  graph  paper,  the  result  is  a sigmoid  curve.  This  curve  can  be 
linearized  by  plotting  the  cumulative  distribution  on  normal  probability 
graph  paper.  Figure  3 illustrates  a series  of  frequency  distributions 
departing  from  normality.  You  will  find  these  as  useful  guides  when 
examining  the  distributions  of  your  data  on  probability  paper.  The  graphic 
method  is  illustrated  in  Example  3. 
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Cumulative  Frequency 


% % 


distributions 


Figure  3.  Examples  of  several  frequency  distributions 
and  their  respective  cumulative  frequency  distributions 
(after  Sokal  and  Rohlf,  1969). 
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Frequency 


EXAMPLE  3 

Graphical  Test  for  Normality 


Problem: 

Graphically  test  the  populations  presented  in  Table  3 for  normality  of 
their  frequency  distribution. 

Solution: 

(a)  Prepare  a frequency  distribution  and  cumulative  frequency  distribution 
for  (1)  specific  conductance  and  (2)  fecal  col i form. 


SPECIFIC 

CONDUCTANCE 

CLASS  INTERVAL \j 
LOWER  UPPER 

LIMIT  LIMIT 

(umhos/cm) 

FREQUENCY 

(f) 

CUMULATIVE 

FREQUENCY 

(F) 

CUMULATIVE 
PERCENT 
FREQUENCY 
(%  F)  2/ 

50.5 

55.5 

1 

1 

1 

55.5 

60.5 

3 

4 

4 

60.5 

65.5 

5 

9 

9 

65.5 

70.5 

9 

18 

18 

70.5 

75.5 

14 

32 

32 

75.5 

80.5 

15 

47 

47 

80.5 

85.5 

16 

63 

63 

85.5 

90.5 

15 

78 

78 

90.5 

95.5 

10 

88 

88 

95.5 

100.5 

6 

94 

94 

100.5 

105.5 

3 

97 

97 

105.5 

110.5 

2 

99 

99 

110.5 

115.5 

1 

100 

100 

\j  Steel  and  Torrie  (1960)  suggest  some  guidelines  for  determining  the  size 
of  the  class  interval:  A rule  of  use  in  determining  the  size  of  the 

class  interval  when  high  precision  is  required  in  calculations  made  from 
the  resulting  frequency  table  is  to  make  the  interval  not  greater  than 
one-quarter  of  the  standard  deviation.  If  this  rule  is  strictly  adhered 
to,  the  data  are  sometimes  not  sufficiently  summarized  for  graphical 
presentation.  If  the  size  of  the  class  interval  is  increased  to 
one-third  to  one-half  of  a standard  deviation,  the  resulting  frequency 
table  will  usually  be  a sufficient  summary  for  graphical  presentation 
and  adequate  for  most  data;  the  lack  of  precision  in  any  statistics 
calculated  from  the  table  will  be  small  enough  to  be  ignored. 

2/  Take  care  to  note  that  since  100  observations  are  contained  in  the 
example  population,  the  cumulative  frequency  (f)  and  cumulative  percent 
frequency  (%  F)  columns  are  identical.  However,  in  the  case  where  the 
cumulative  frequency  (F)  is  not  equal  to  100,  the  cumulative  percent 
frequency  (%  F)  is  determined  by  dividing  the  observations  in  each  class 
interval  (f)  by  the  total  number  of  observations  in  the  sample. 
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FECAL  COLIFORM 


CLASS  INTERVAL 
LOWER  UPPER 
LIMIT  LIMIT 
(counts/100  ml ) 

FREQUENCY 

(%) 

CUMULATIVE 

FREQUENCY 

(F) 

CUMULATIVE 
PERCENT 
FREQUENCY 
(%  F) 

6.5 

10.5 

5 

5 

5 

10.5 

14.5 

15 

20 

20 

14.5 

18.5 

21 

41 

41 

18.5 

22.5 

17 

58 

58 

22.5 

26.5 

14 

72 

72 

26.5 

30.5 

11 

83 

83 

30.5 

34.5 

5 

88 

88 

34.5 

38.5 

2 

90 

90 

38.5 

42.5 

1 

91 

91 

42.5 

46.5 

3 

94 

94 

46.5 

50.5 

1 

95 

95 

50.5 

54.5 

1 

96 

96 

54.5 

58.5 

0 

96 

96 

58.5 

62.5 

2 

98 

98 

62.5 

66.5 

0 

98 

98 

66.5 

70.5 

1 

99 

99 

70.5 

74.5 

0 

99 

99 

74.5 

78.5 

1 

100 

100 

(b)  Graph  the  cumulative  % F versus  class  interval  (Figure  4)  for  specific 
conductance  and  fecal  coliform  on  normal  probability  paper.  Fit  a 
straight  line  to  each  data  set  giving  weight  to  those  points  occurring 
between  cumulative  frequencies  of  25%  to  75%. 


(c)  The  specific  conductance  data  follow  the  straight  pretty  well, 

strongly  suggesting  a normal  distribution,  while  the  trend  of  the 
fecal  coliform  population  suggests  that  it  is  skewed  to  the  right 
(Figure  4).  A plot  of  the  actual  frequency  distribution  is  included 
for  your  reference  (Figure  5 and  6). 


30 


lU'jJp  PROBABILITY  4S  8003 

lrVri£  X 90  DIVISIONS  MADE  IN  U S.  A 


Fecal  Coliform  (counts/100  ml) 


31 


99.8  99.9 


FILE  NUNMf  ICHfAfJftN  OAIF  ■ 02/07/001 

SCAllFHOHAM  OF  W1HM1  COHOHCIIVIIV  UCROSSI  VAROl  FtllOlIfNCV 

$0.01)  ft?.©®  00.00  74.00  HO. 00  HO.  00  02.00  OH. 00  104.00  110.00 


9 


9 


32 


Figure  5.  Frequency  diagram  for  the  conductivity  data  from  Table  3. 

Note  that  the  lines  connecting  the  plot  points  have  been  added  to  the  SPSS  plot. 
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Figure  6.  Frequency  plot  of  the  fecal  conform  data  from  Table  3. 

Note  that  the  lines  connecting  the  plot  points  have  been  added  to  the  SPSS  plot. 


The  Shapiro-Wilk  and  Kolmogorov  tests  for  goodness  of  fit  are  both 
computational  procedures  which  allow  us  to  compare  an  observed  frequency 
distribution  with  the  expected  normal  frequency  distribution.  The 
Shapiro-Wilk  test  (Shapiro  and  Wilk,  1965)  should  be  used  when  the  number 
of  observations  is  less  than  or  equal  to  50  while  the  Kolmogorov-Smi rnov 
test  (Stephens,  1974)  should  be  applied  when  the  number  of  observations  is 
greater  than  50. 

The  Shapiro-Wilk  test  produces  a W-statistic.  The  null  hypothesis 
(no  difference  between  the  observed  and  the  expected  normal  frequency  dis- 
tribution) is  rejected  (see  Section  7.0)  for  small  values  of  W.  The 
Kolmogorov-Smi rnov  test  yields  a D-statistic.  The  null  hypothesis,  in  this 
case,  is  rejected  for  large  values  of  D. 

Since  both  of  these  procedures  are  rather  involved,  the  computational 
steps  are  not  outlined  here.  Both  methods  are  readily  accessible  through 
SAS  and  the  Kolmogorov-Smi rnov  D-statistic  in  SPSS.  These  are  illustrated 
in  Examples  4a  and  4b. 
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EXAMPLE  4a 

Kolmogorov-Smirnov  Test  for  Normality 


Problem: 

Use  the  UNIVARIATE  procedure  from  SAS  (1979),  which  applies  the 
Kolmogorov-Smirnov  test  when  n > 50,  to  test  the  specific  conductance  and 
fecal  col i form  populations  presented  in  Table  3 for  normality. 

Solution: 

The  SAS  output  for  this  example  is  presented  in  Tables  8 and  9.  The 
Kolmogorov-Smirnov  test  for  normality  results  in  a D-statistic  (bottom  of 
column  1 in  the  upper  left  portion  of  the  output)  and  the  probability  of  a 
larger  D (bottom  of  column  2 in  the  upper  left  portion  of  the  output).  The 
PROB  > D ranges  from  0 to  1 which  is  the  likelihood  of  obtaining  a D value 
greater  than  the  one  printed.  In  other  words,  as  long  as  PROB  > D is  equal 
to  or  greater  than  0.95  we  can  readily  accept  the  hypothesis  that  the 
observed  distribution  is  no  different  from  a normal  distribution.  In  the 
case  of  the  specific  conductance  data  (Table  8)  we  would  accept  it  as 
normally  distributed  since  PROB  > D = 1 while  in  the  case  of  the  fecal 
coliform  data  (Table  9)  we  would  reject  it  as  being  normally  distributed 
(PROB  > D = 0.0256269).  If  we  accept  the  fecal  coliform  data  as  normally 
distributed  there  is  a 97.4%  chance  we  have  made  the  wrong  decision. 
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Table  8.  SAS  output  for  the  specific  conductance  portion  of  Example  4a. 
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Table  9.  SAS  output  for  the  fecal  col i form  portion  of  Example  4a 


EXAMPLE  4b 

Shapiro-Wilk  Test  for  Normality 


Problem: 

Use  the  UNIVARIATE  procedure  from  SAS  (1979)  which  applies  the 
Shapiro-Wilk  test  when  n _<  50,  to  test  the  specific  conductance  and  fecal 
col i form  samples  used  in  Example  1 for  normality. 

Sol ution: 

The  SAS  output  for  this  example  is  presented  in  Tables  10  and  11.  The 
procedure  produces  the  Shapiro-Wilk  W-statistic  and  the  probability  of  a 
smaller  W (located  in  the  upper  left  of  the  output  at  the  bottom  of  columns 
1 and  2).  The  PROB  < W ranges  from  0 to  1 which  is  the  likelihood  of 
obtaining  a W value  less  than  the  one  calculated.  As  long  as  the  PROB  < W 
is  0.95  or  greater  we  can  readily  accept  the  hypothesis  that  the  sample  has 
been  obtained  from  a population  which  is  normally  distributed.  In  the  case 
of  the  specific  conductance  sample,  we  accept  it  as  coming  from  a normally 
distributed  population  since  PROB  < W = 0.95.  The  fecal  coliform  sample, 
on  the  other  hand,  has  a PROB  < W = 0.01  which  results  in  the  rejection  of 
the  hypothesis  that  it  was  obtained  from  a normally  distributed  population. 
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Table  10.  SAS  output  for  the  specific  conductance  portion  of  Example  4b. 
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Table  11.  SAS  output  for  the  fecal  coliform  portion  of  Example  4b. 


5.1.4  Transformations 


Now  that  we  know  how  to  test  our  data  for  normality,  the  question 
arises,  "What  do  we  do  if  our  data  are  not  normally  distributed?" 
Basically,  there  are  three  options  available:  (1)  determine  what 

theoretical  frequency  distribution  it  follows,  such  as  the  binomial  or  the 
Poisson,  and  proceed  accordingly;  or  (2)  use  nonparametric  statistical 
methods;  or  (3)  transform  the  data. 

We  suggest  you  apply  a data  transformation  first  unless  you  have 
strong  reason  to  believe  the  data  follow  another  theoretical  frequency 
distribution.  In  general,  a log  (X)  transformation  will  normalize  a water 
quality  data  set  composed  of  continuous  or  measured  data.  This  is 
especially  true  if  the  data  are  skewed  to  the  right.  In  the  case  where 
zero  values  are  present,  then  the  transformation  to  use  is  log  (X  + 1). 
This  avoids  the  problem  of  taking  the  log  of  zero.  There  are  other 
transformations  available;  however,  the  log  (X)  or  log  (X  + 1)  will 
generally  correct  your  problem  of  nonnormality.  If  it  does  not,  then  we 
suggest  you  consult  a statistician  before  proceeding  with  your 
analysis.  Example  5 illustrates  the  use  of  the  log  (X)  transformation. 
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EXAMPLE  5 

Log  (X)  Transformation 


Problem: 

Transform  the  fecal  coliform  data  contained  in  Table  3 by  log  (X)  and 
test  it  for  normality,  using  SAS  (1979)  or  SPSS  (1975). 

Solution: 

1.  Transform  each  FC-j  in  Table  3 to  log  (FC-j). 


FC 

Log  (FC) 

FC 

Log  (FC) 

FC 

Log  (FC) 

FC 

Log  (FC) 

FC 

Log  (FC) 

7 

0.8451 

15 

1.1761 

18 

1.2553 

23 

1.3617 

29 

1.4624 

8 

0.9031 

15 

1.1761 

19 

1.2798 

23 

1.3617 

30 

1.4771 

9 

0.9542 

15 

1.1761 

19 

1.2798 

24 

1.3802 

30 

1.4771 

10 

1.0000 

15 

1.1761 

19 

1.2798 

24 

1.3802 

31 

1.4914 

10 

1.0000 

15 

1.1761 

19 

1.2798 

24 

1.3802 

31 

1.4914 

11 

1.0414 

16 

1.2041 

19 

1.2798 

24 

1.3802 

32 

1.5051 

11 

1.0414 

16 

1.2041 

20 

1.3010 

25 

1.3979 

32 

1.5051 

11 

1.0414 

16 

1.2041 

20 

1.3010 

25 

1.3979 

33 

1.5185 

12 

1.0792 

16 

1.2041 

20 

1.3010 

25 

1.3979 

37 

1.5682 

12 

1.0792 

16 

1.2041 

20 

1.3010 

26 

1.3979 

38 

1.5798 

12 

1.0792 

16 

1.2041 

21 

1.3222 

26 

1.4150 

40 

1.6021 

13 

1.1139 

17 

1.2304 

21 

1.3222 

26 

1.4150 

44 

1.6435 

13 

1.1139 

17 

1.2304 

21 

1.3222 

27 

1.4314 

45 

1.6532 

13 

1.1139 

17 

1.2304 

21 

1.3222 

27 

1.4314 

46 

1.6628 

13 

1.1139 

17 

1.2304 

22 

1.3424 

27 

1.4314 

49 

1.6902 

14 

1.1461 

17 

1.2304 

22 

1.3424 

28 

1.4472 

54 

1.7324 

14 

1.1461 

18 

1.2553 

22 

1.3424 

28 

1.4472 

59 

1.7709 

14 

1.1461 

18 

1.2553 

22 

1.3424 

28 

1.4472 

61 

1.7853 

14 

1.1461 

18 

1.2553 

23 

1.3617 

29 

1.4624 

67 

1.8261 

14 

1.1461 

18 

1.2553 

23 

1.3617 

29 

1.4624 

75 

1.8751 

2. 

The  UNIVARIATE 

procedure 

from 

SAS  (1979) 

was 

used  to  test 

the 

transformed 

data  for 

■ normality 

. If 

r SPSS  (1975)  was  selected. 

the 

NPAR 

TESTS  subprogram  would  have  been  applied.  The  SAS  output  is  presented  in 
Table  12.  Since  the  PROB  > D = 1,  we  accept  the  hypothesis  that  the  data 
come  from  a population  following  a normal  distribution. 

It  should  be  noted  that  prior  to  the  log  (X)  transformation,  the  fecal 
coliform  population  was  not  normally  distributed  (Example  4a).  However, 
the  transformation  made  it  conform  to  the  normal  frequency  distribution. 
This  will  generally  occur  when  the  data  are  moderately  skewed,  as  in  this 
example  problem. 


10147  WFnNFSDAY*  MARCH  26i  19110 


H-  O IT  r>  N »c 
i/ircncc 
u,  e it  v£  ifi 

I r*  x cv.  f* 

O r*.  r*  ® ® 


to 

Ui 

X 

u. 

K 

X 

Ui 


H»  ® O <\J  — — 

ft 

♦ 

♦ 

to  O'  © 4 

ft 

♦ 

uj  c n (vj 

• 

♦ 

u 

s un  © •«■ 

it 

0 

o o tr> 

fees 

$ 

_J  « •O' 

o 

ft 

♦ mm 

» © • 

-1 

ft 

9 ❖ 

2 

© 

© 

a. 

ft 

* 

ft 

0 

© 

>* 

ft 

♦ 8 

© 

^=> 

ft 

♦ ♦ 

cm 

ft  6 

uir— x(v,0‘irr'f^x 

mi 

ft  0 

*4  i\i  h c r,  cc  r,  c c 

ft  0 

« inr?(V.an-GU  X LT 

at 

ft  0 

<4irxxirf^x(\.fte 

<L 

ft  ♦ © 

><c<c^  • f~  r^.  « ffl 

<E 

ft  0 ♦ 

e 

ft  0 

a 

ftl  0 

' 

! 

a. 

fti  0 

; ! 

ft'  ♦ 

i 

r*»  (M  33  © — ® 

-J 

*1  1) 

© o r*  — ^ 

*1  « • 

X O'  0*  O'  © ! 

r 

* « 8 

i 

(V  x ^ r*  yn  i 

a 

ft  ft  0 

I 

a • if 

c ^ 

© 

j ft  ♦ — 

TCGQGceeeee 

• — 1 

• $ 

z 

! « 8 0 

W3  •••••••••* 

— 

— , • 

^•UlOX  — CB^  O- 

• © 1 

j 

j ft  0 

ft  0 

ft  ❖ 

c 1 

1 ft  0 

Ct«j©©ccee  cco© 

s 

u j • • « t • • • • • • 

«a  «P  =fi 

at  a*  & 

j * » 

O'inao  in  — 

♦ ••  1 

u i 

to 

UI 


z 

■cs 


X COCO©©©©© 

W3  

HUwn^ifi'CMO'O 
Z ^O'O'^^^O'O'O 


CZ^iOOOOOOOOO 


♦ fVi 

8 


! ♦ m-  ♦ 
1 — ♦ 

I X* 


rn  ^ (\i  <\j  < 

z 


UXX4f\JXUl—  CVXX 


3 

X 

>c  n o> 

X 

<£  (Vi  (\| 

— 

o 

u 

1 

n — 

(V 

r' 

IT 

X 

r*-  c 

>» 

© 

© 

ri  c 

© 

O' 

O'  h-  - 

n-n. 

— ^ 

<C 

ao 

i r 

X 

X 

O'  (V 

UJ 

IT. 

— — X 

o 

O'  (V,  4 

s 

< 

n <r 

4 

O' 

© 

a 

tr 

© 

=J 

po 

OJ  IT  O 

> 

4 4 

• 

>3 

4 40  t/1 

• 

m x 

*g 

4 n — 

4 

sinfy 

N= 

• © 

« 

o 

• 

• 

mm 

o • 

• 

® 

• 

X 

• a 

1 

a 

i 

> 

. 

1 

i 

z 

< 

G 

• c 

© 

; 

i 

| 

> 

►»  ♦ 

ft 

♦ 

2 

i 

< 

BCO 

i 

» 

C 1 

8 

• 

ui 

: 

I 

z 

< 

r 

i 

J » 

» 

8 

= 

’ 

z 

rw- 

— 

: 

a.  ft  © o 

* 

9 

© 

e 

CSC 

r 

Ui  — 

x o 

8 

0 

- 

u 

i 

i ■ 

i 

O C u 

© 8 

0 

0 

a 

z 

e o 

c 

© 

C 

© 

s 

© 

C C 

et= 

a at  a 

a 

Z 8 C 

a ♦ 

ft 

•♦ 

u 

to 

* 

• © 

0 

« 

• 

o 

• 

• 

• • 

s 

tr  Qir 

Q 

cr  a 

t= 

u 

vC  — 

X 

c 

4 

X 

(V 

X 

© <\ 

c 

r»  1/ 

CVj 

ee  c s 

u 

m -» 

4 

ir 

IT. 

IT 

>0 

X 

xO  r~ 

© 

a 

. 

u 

! 

i 

0*  X 4 

c= 

0- 

n 

ec 

J 

© © 

© 

c 

© 

o 

o 

© 

© c 

(V 

•J  (V 

u 

• © 

• 

• 

• 

• 

• 

• 

• • 

a 

i 

a. 

J.  kfl  S,  XI 

4 

4 

4 

4 

4 

m "0 

1 

1 

u 

to 

| 

1 

1 

— 

| 

i 

eri 

♦ 

— 

lf>  IT 

Ji 

4 

4 

4 

4 

4 

n r 

© tr.  x 4 oj 
© rv.  n **  — 
— r*  41  ao  4i 
• rvl  r*  a» 
1 cv,  © rj  oi 
m 4ir.  • 
— ©j  • n 

4 C 


i/t 

H- 

c 


] 


l/) 

u — 

Z to 


I© 

1 z ■ 


fa 


r x a X to 
Z z < 3 to 

to  tr  >j  * u 


c tr>  <n  (v  4* 
© rvj  © 4 x 
n »fl  ^ c 
Oj  ox  • 
n c.3*  ^ 
• j\j  — r- 

Si  S 


J*) 

to 

aJ  a. 

C I 


ai 

x 

XI 

X 


4 

4 

41- 

d 

M 

(VI 

SI 


p' 


X 


v* ; 


< CiLl  r 

a*  — *:  ^ ^ - 

i-\z/ 


(V  a x 

— ca  c 

— • 03  CL 

O'  S3  X 

. O'  X.  X 

x x!  -o 

s xi  m 

x xi  in 

X X IT 

x x ir 

IT.  X — 

tr.  m — 

in  n — > 

r n^- 
>r,nc 
s r,  r,  x 
m — © x 

a-  X — © 4 

< IT.  © © 4 •J’l 

u x 4 © © © © 

«j  -n  © © ©,  © .n 

X x X 4 *V  © £ 


Z 4 <v  > © oj  4 r*.  rv.  o.  y 
„ c it  a »-  (V^  — cr^4 
<rur  nciv^  voxfft- 
l>rvjrvojr“n^nr>.  r34 


Xococ  © e © e © © 

to  z • ••••••••• 

H-u^-fvnir.  cl-^cip- 
— — <M  ru  ro 


x — © c a c o 


© c © © 


a.  ur  — — i — <vnr>4"tr.  */i  x 

;«  i ; 

z 

' z 


aJ£0(Vi—  0X4^0 
, Z O © 4 n — O — © — 

^ c n (\i  — o ^ x x 4 

X £•  © 4*  4 ^ — 4 S 

> 4 O IT  C C — — — (V 

cc  *cr  • •♦••• 


• c 


43 


Table  12.  SAS  output  for  Example 


5.2  Student's  t-Distribution 


The  student's  t-distribution  was  introduced  by  William  Gosset  who 
wrote  several  statistical  papers  under  the  pseudonym  of  "Student."  This 
distribution  was  developed  for  problems  involving  the  sample  mean  when  the 
population  variance  is  not  known  and  the  sample  size  is  small  (n  < 30). 
With  this  distribution,  we  can  make  statistical  statements  about  the  pop- 
ulation mean  using  only  the  sample  variance  and  the  sample  mean. 

If  our  sample  has  been  randomly  drawn  from  a normal  population  with 
mean  y and  variance  a,  then  the  statistic 


is  distributed  as  the  student's  t-distribution  with  "n-1"  degrees  of 
freedom,  s^  denotes  the  standard  error  of  the  mean.  A reliable  estimate 
of  the  standard  error  of  the  mean  is  obtained  by: 


„ _ /sj  _ /e(X,  - X)2/n  - 1 

*'\n  "V  n 

The  theoretical  t-distribution  is  similar  to  the  normal  distribution 
in  that  it  extends  from  negative  to  positive  infinitely  but  differs  in 
shape  depending  on  the  number  of  degrees  of  freedom  (v).  As  the  number  of 
observations  approaches  infinity,  the  student's  t-distribution  approaches 
the  normal  distribution  with  mean  0 and  variance  1. 

Table  A-2  lists  the  percent  of  the  area  in  both  tails  of  the  curve 
using  degrees  of  freedom  (v)  and  probability  (a)  as  arguments.  It  is 
important  that  you  remember  this  is  a two-tailed  table.  Consequently,  a 
probability  of  0.05  means  that  0.025  of  the  area  is  in  each  tail.  For  a 
single-tail  test,  you  need  to  halve  the  probability  argument  prior  to 
entering  Table  A-2.  Values  of  t are  commonly  denoted  as  ta  (v)  = area; 


44 


for  example,  the  t-value  of  a = 0.05  and  v=  8 is  t.Q5(8)  = 2.306.  It  is 
very  important  that  you  understand  how  to  use  Table  A-2  since  you  will  use 
it  often. 


5.3  The  Chi-square  Distribution 

The  chi-square  distribution  has  application  for  problems  involving  the 
sample  variance.  If  a sample  is  randomly  drawn  from  a normal  population 
with  mean  y and  variance  a,  then  the  statistic 


2 _ qXj-X)2  _ (n  - 1)s2 

X “ a2  a 2 


(7) 


follows  the  chi-squared  distribution  with  "n-1"  degrees  of  freedom.  It 

2 

should  be  apparent  from  Equation  7 that  there  is  a relation  between  x and 
s^,  which  is  similar  to  the  relation  between  t and  X just  discussed. 

The  chi-square  distribution  is  similar  to  the  t-di stribution  in  that 
for  each  number  of  degrees  of  freedom,  there  is  a specific  chi-square 
curve.  However,  unlike  the  normal  and  t-distributions,  the  chi-square 
cannot  be  negative  since  it  involves  the  square  of  the  sum  of  squares. 

In  general , only  a few  values  for  each  of  the  many  chi-square  curves 
are  tabulated.  Table  A-3  presents  chi-square  values  for  given  pro- 
bability levels  and  degrees  of  freedom.  The  table  is  designed  so  that  we 
can  determine  the  likelihood  that  a calculated  Xs2will  exceed  the 
theoretical  xa2in  other  words 


P(Xs2  > x.2)  = « 

5.4  The  F Distribution 

The  chi-square  distribution  allows  us  to  test  various  statements 
concerning  a single  variance.  However,  if  we  want  to  test  statements 
concerning  the  variances  of  two  populations,  we  must  use  the  F statistic. 
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It  is  given  by  the  formula 


and  has  a sampling  distribution  called  the  F distribution.  There  are  two 

2 2 

sample  variances  involved,  and  s^  representing  the  variances  of 
populations  1 and  2,  respectively.  Associated  with  each  variance  is  its 
degrees  of  freedom,  Vi  (n^-l)  for  the  numerator  and  v2  (n2~l)  for  the 
denominator.  Appendix  Table  A-4  gives  critical  values  of  F.  The  table  is 
entered  with  three  values,  a,vi,  and  v2  . Critical  F values  are  commonly 
denoted  by  F > %.  The  percentages  listed  in  the  table  refer  to  the 

vi , v2 ) 

proportion  of  the  area  under  the  curves  to  the  right  of  the  values  given  in 
the  tables.  For  example,  for  F.oi(lO,12)  = 4. 30,  one  percent  of  the  area 
is  to  the  right  of  4.30. 

6.0  Confidence  Limits  About  the  Sample  Mean 

When  we  estimate  a population  parameter  with  a sample  statistic,  such 
as  ywith  X,  we  should  ask:  "How  reliable  is  our  estimate?"  Since  the 

real  values  of  the  population  parameters  in  our  various  water  quality 
studies  will  always  remain  unknown  to  us,  we  cannot  evaluate  our  estimate 
with  a direct  comparison.  However,  with  the  statistical  methods  we  now 
have  available  to  us,  we  can  predict  the  reliability  of  our  estimate  by 
setting  a confidence  interval  about  it. 

A confidence  interval  consists  of  two  confidence  limits  which  set  an 
upper  (U)  and  lower  (L)  bound  to  the  interval.  The  level  of  confidence  is 
expressed  as  a probability  (P)  or  percent  and  indicates  the  likelihood  that 
the  interval  obtained  from  a particular  sample  brackets  the  true  parameter 
value  ( P ' ) . This  concept  can  be  expressed  as 

P (L  _<  P'  < U)  = 1 - a 
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\ 


where  a denotes  the  level  of  reliability.  In  other  words,  there  is  a 1-a 
chance  that  the  sample  is  representat i ve  of  the  population  and  that  the 
confidence  interval  covers  the  population  parameter. 

Confidence  limits  for  the  sample  statistics  % and  s^  are 

2 

determined  using  the  t and  X distributions.  Confidence  limits  for  a mean 
of  a normally  distributed  population  whose  standard  deviation  is  not  known 
can  be  determined  from  a random  sample  using  the  equations: 


and 


k X ta(n  _ i)Sy 

U = X + ta(n  _ i)Sj^ 


(9) 

(10) 


An  application  of  setting  confidence  limits 
Example  6. 


is  illustrated  in 
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EXAMPLE  6 

Setting  Confidence  Limits  for  the  Population  Mean 


Problem: 

Twenty-one  water  samples  were  collected  in  a random  manner  at  the 
mouth  of  Gypsum  Creek  and  analyzed  for  total  dissolved  solids  (TDS).  The 
data  were  tested  and  found  to  be  normally  distributed  with  a mean  TDS  of 
517  mg/1  and  a standard  deviation  of  67  mg/1.  Determine  the  95%  and  99% 
confidence  limits  for  the  population  mean,  p . 

Solution: 

Case  I.  Determination  of  the  95%  confidence  interval  about  the  mean. 

a.  Lower  Limit 

L = X-  t 05(n-1)  ^ 

L = X - t .05(20)^= 

L = 517-2.086  ( 

L = 486.5  mg/I 

b.  Upper  Limit 

U = X + t 05(n  _ i) 

U = 517  + 30.5 
U = 547.5  mg/I 

Concl usion:  There  is  a 95%  chance  that  the  population  mean,  p,  will 

be  covered  by  the  interval  486.5  to  547.5  mg/1. 

Case  II:  Determine  the  99%  confidence  interval  about  the  mean. 

a.  Lower  Limit 

L = X - t.oi(n-l)  ^ 


JnH17-30-5 


L = 517-2.845 


517-41.6 


L = 475.4  mg/I 
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b.  Upper  Limit 


U = X + t oi(n  - i) 

U = 517  + 41.6 


U = 558.6  mg/I 

Conclusion:  There  is  a 99%  chance  that  the  interval  475.4  to  558.6 

mg/1  will  cover  the  population  mean.  Note  that  the  99% 
confidence  interval  is  larger  than  the  95%  one. 
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7 . 0 Hypothesis  Testing 


Hypothesis  testing  plays  an  integral  role  in  statistical  decision- 
making. A statistical  hypothesis  is  simply  a statement  made  about  the 
expected  results  of  a study,  such  as  y =y0  , or  yi  < y2.  The  fundamental 
hypothesis  is  the  null  hypothesis,  denoted  by  H0.  In  some  cases,  the 
null  hypothesis  is  one  of  no  difference.  In  others,  it  states  that  a given 
parameter  does  not  exceed  some  standard  or  other  value.  In  practice,  we 
may  seriously  doubt  the  truth  of  H0  from  the  moment  it  is  proposed. 

However,  its  purpose  is  to  give  us  a starting  point  from  which  we  can 
calculate  a meaningful  test  statistic  and  come  to  an  objective  decision. 

Any  hypothesis  that  differs  from  the  null  hypothesis  is  called  an 
alternate  hypothesis,  denoted  by  Ha. 

Once  we  have  established  our  hypotheses,  H0  and  Ha,  we  proceed  to 
sample  the  population  in  question.  These  data  are  then  analyzed 
statistically  and  a decision  is  made  to  either  accept  or  reject  the  H0. 
Unfortunately,  there  is  no  guarantee  that  our  decision  will  be  correct.  In 
fact,  there  are  two  mistakes  possible:  (1)  if  the  stated  hypothesis  is 

true,  we  might  erroneously  call  it  false  (Type  I error)  and  (2)  if  the 
stated  hypothesis  is  false,  we  might  erroneously  call  it  true  (Type  II 
error) . Our  decision  to  accept  or  reject  a hypothesis  will  be  based  on  two 
factors:  (1)  the  information  we  obtain  from  our  sample  and  (2)  the  risk 

that  we  are  willing  to  take  that  our  decision  may  be  wrong. 

The  Type  I error  is  generally  expressed  as  a probabi 1 i ty  and  is 
denoted  by  "a".  When  it  is  expressed  as  a percentage,  it  is  termed 
significance  level.  Consequently,  a Type  I error  of  a = 0.05  is  equivalent 
to  a significance  level  of  5%.  The  level  of  significance  should  be 
established  prior  to  data  collection  with  the  following  guidelines  in  mind. 


50 


If  it  is  a matter  of  serious  concern  when  a true  hypothesis  is  rejected, 
such  as  evidence  to  shut  down  a logging  operator,  the  risk  of  making  this 
error,  a,  should  be  small.  However,  if  it  is  important  that  the  hypo- 
thesis be  rejected  when  there  is  slight  evidence  against  it,  such  as  con- 
tamination of  a public  water  supply,  we  should  choose  a larger  a. 

The  concept  of  rejection  or  acceptance  of  the  null  hypothesis  can  best 
be  illustrated  with  an  example.  Suppose  we  hypothesize  that  a body  of 
water  has  a mean  TDS  concentration  equal  to  or  less  than  80  mg/1. 

Therefore,  H0:  y _<  80  mg/1  and  Ha:  y > 80  mg/1  are  the  null  and 

alternate  hypotheses.  We  choose  a significance  level  of  5%  as  the  basis 
for  rejection  of  H0.  It  is  assumed  that  the  TDS  in  the  water  body  are 
normally  distributed.  A sample  of  14  observations  is  collected  from  the 
water  body  in  a random  manner  and  found  to  have  a mean  of  89  mg/1  and  a 
variance  of  36.  The  question  now  is  do  we  accept  or  reject  H0?  Before 
we  can  do  this  we  need  to  determine  the  critical  region.  As  we  will  see 
later,  the  critical  point  in  this  case  is  defined  by  X + t.Q5(13)s  which 
is  equal  to  83.5  (80  + 2.16  -n/36/14') . The  critical  region  is  illustrated  in 
Figure  7.  Since  89  falls  to  the  right  of  82.6,  we  reject  H0  which  means 
it  is  not  very  likely  that  the  sample  obtained  came  from  a water  body  with 
a mean  TDS  of  80  mg/1 . 

When  H0  has  been  rejected  at  a given  significance  level,  we  say  the 
sample  is  significantly  different.  The  degree  of  statistical  significance 
is  a function  of  the  probability  level  at  which  the  difference  is  detected 
(Table  13). 

The  significant  levels  of  5,  1 and  0.1%  are  arbitrary.  However,  in 
the  case  of  small  samples  (n  < 30),  it  is  not  likely  that  H0  will  be 
rejected  if  these  levels  are  required,  unless  a very  large  difference 
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Figure  7.  Rejection  and  acceptance  regions. 


exists.  If  this  is  the  case,  a significance  level  of  10%  may  be 

more  appropriate  (Steel  and  Torrie,  1960).  If  the  null  hypothesis  is  not 

rejected,  it  is  considered  not  significant  and  is  denoted  by  "ns". 


TABLE  13.  Probability  levels  for  various  degrees  of  significant 
di fferences. 


Probabi 1 i ty  Level 

Degree  of  Significance 

Superscript  Notation  1/ 

.01  < p < .05 

Si gnif icant 

★ 

.001  < p < .01 

Highly  Significant 

** 

p < 0.001 

Very  Significant 

*** 

1/  Superscript  notation 

is  commonly  used  with  a 

test  statistic  to  denote 

the  degree  of  significance,  such  as  t = 2.05**  to 

denote  a highly 

significant  difference. 

At  this  point,  we  would  like  to  caution  you  about  your  use  of  the  word 
"significant".  Sokal  and  Rohlf  (1969)  summarize  this  point  very  well. 
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"Since  statistical  significance  has  special  technical  meaning, 

H0  rejected  at  P < a , we  should  use  the  adjective  significant 
only  in  this  sense;  its  use  in  scientific  papers  and  reports, 
unless  such  a technical  meaning  is  clearly  implied,  should  be 
discouraged.  For  general  descriptive  purposes,  synonyms  such  as 
important,  meaningful,  marked,  noticeable,  and  others  can  serve 
to  underscore  differences  and  effects". 

The  Type  II  error,  which  is  accepting  the  null  hypothesis  when  it  is 
false,  is  commonly  denoted  by  3.  The  probability  of  a 3 error  is 
determined  by  the  choice  of  a and  distance  between  yx,  and  y2,  as 
illustrated  in  Figure  8.  It  is  apparent  that  in  the  case  of  a fixed  sample 
size,  a reduction  of  a will  be  accompanied  by  an  increase  in  3. 
Consequently,  we  need  to  consider  very  carefully  the  consequences  of  the 
different  types  of  error  when  we  choose  a level  of  significance  (Table  14). 


for  null  hypothesis 


Figure  8.  The  relationship  between  the  Type  I and  Type  II 
errors  (after  Steel  and  Torrie,  1960). 
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Table  14.  Statistical  decisions  and  their  outcomes,  with  special  reference 
to  the  type  of  error  (after  Steel  and  Torrie,  1960). 


Data  is  from  a population  for  which 


Null  Hypothesis 


Alternative  hypothesis 


Decision  relative  to 


H0  is 


True 

1 

Fal  se 

1 Accept 

1 

Reject 

1 Right 

1 

High 

True 

1 

1 

F alse 

1 Reject 
1 

1 

1 

Accept 

I Wrong;  Type 
j I error 

1 

1 

Low 

False 

1 

1 

True 

1 Accept 
1 

1 

1 

Reject 

1 Wrong;  Type 
1 II  error 

1 

1 

Low 

False 

1 

1 

True 

1 Reject 
1 

1 

1 

Accept 

1 Right 
1 

1 

1 

High 

Ha  is 


Decision  is 


Probability 
should  be 


The  basic  steps  for  statistical  hypothesis  testing  can  be  summarized 
as  follows: 


1.  Establish  H0  and  Ha. 

2.  Ideally,  specify  a and  3.  However,  in  practice  a and  6 are 
general iy  speci fied. 

3.  Determine  the  critical  region  for  rejection  of  the  null 
hypothesis. 

4.  Compute  the  test  statistic  from  the  observed  values  obtained 
through  sampling. 

5.  Accept  or  reject  the  hypothesis  depending  on  the  position  of  the 
test  statistic  relative  to  the  critical  region. 


8.0  Testing  for  Homogeneity  of  the  Variance 

Homogeneity,  or  equality,  of  variances  in  a group  of  samples  is  an 
important  prerequisite  for  many  of  the  statistical  tests  which  follow. 

There  are  two  basic  procedures  for  testing  equality  of  the  variances:  (1) 

when  only  two  samples  are  involved  we  use  the  F test  and  (2)  when  there  are 
more  than  two  groups  involved,  Bartlett's  test  provides  a means  for 
evaluating  the  assumption.  Examples  7a  and  7b  illustrate  the  computations 
involved  in  each  of  these  procedures. 
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EXAMPLE  7a 

Testing  for  Homogeneity  Between  Two  Variances 


Problem: 

For  the  data  given  below  determine  whether  the  two  variances  can  be 
considered  equal  at  a significance  level  of  5%. 

Gi ven: 


Sampling  Station  Constituent 


n X 


s^ 


A 

B 


TDS  12  68  8 

TDS  11  54  34 


Solution: 

1.  Establish  the  hypothesis  to  be  tested. 
H0:  aA1 2  = oB2 
Ha:  aA2  * aB2 


2.  Select  the  significance  level. 


3. 


As  stated  in  the  problem,  a = 0.05 


• • c 2 

Use  the  test  statistic  Fs = ^ to  test  the 


F - — - — - 4 25 
hs  ” sA2  8 4 


hypothesi s. 


It  should  be  noted  here  that  since  only  the  right  tail  of  the 
F-distribution  is  tabulated  in  Table  A-4,  we  calculate  Fs  as 
the  ratio  of  the  greater  variance  over  the  lesser  one. 


4.  Define  the  critical  region. 

Since  the  test  is  two-tailed,  we  look  up  the  critical  value  F 
where  y is  the  Type  I error  andvB=nB-l  andvA=n„-l  are  the  degrees 
of  freedom  of  samples  B and  A,  respectively.  From  Table  A-4  we 
find 

F0. 025(10, 11)  = 3-53 

5.  Reject  or  accept  the  null  hypothesis. 

Since  Fs  > F we  reject  the  null  hypothesis  that  the  variances 
are  equal  at  the  5%  level  of  significance. 
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EXAMPLE  7b 

Bartlett's  Test  of  Homogeneity  of  Variances 


Problem: 

For  the  data  given  below  determine  whether  the  variances  can  be 
considered  equal  at  a significance  level  of  5%. 

Given: 


Corrected  sum  of 

Sampling  station 

n 

s2 

of  squares  (SS) 

A 

8 

54.1 

378.7 

B 

15 

65.3 

914.2 

C 

9 

78.9 

631.2 

D 

12 

69.7 

766.7 

Solution: 

1.  Establish  the  hypothesis 

to  be 

tested . 

H0:  ff a2  = Ob2  = oc  = oo2 


Ha:  a 2 ± ffB2  =£  ^c2  =£  °o2 

2.  Select  the  significance  level. 

From  the  problem  statement,  a = 0.05. 

3.  Transform  the  variances  to  log  (s2)  and  compute  E^n*-  1) log (s^)) 


Station 

log  (s2) 

(n-1)  log  (s2) 

A 

1.7332 

12.132 

B 

1.8149 

25.409 

C 

1.8971 

15.177 

D 

1.8432 

20.275 

Z=  72.993 

4. 


Sum  the  degrees  of  freedom,  i.e.  £ (n-j  - 1),  and  the  corrected  sum  of 
squares. 

E(ni-  1)  = 40 


E(SS,)  = 2690.80 

5.  Determine  the  log  of  the  pooled  within-group  variance,  log  (s2). 


log(s2)  = log 


ESS, 
E(n, - 1) 


log 


2690.8 

40 


1.8278 
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6. 


Determine  xfa- d d.f.  where  a is  the  sum  of  the  number  of  stations 
or  groups  considered;  in  this  example  a = 4. 

Xs2  = 2.3026  [((log(s2)E(ni  - 1))  - £((n,  - 1)  log(§i2))] 

Xs2  = 2.3026  [l. 8278(40)  - 72.993]  = 0.27598 
Note:  2.3026  transforms  the  common  logarithms  to  natural  logarithms. 


7.  Define  the  critical  region  and  accept  or  reject  H0. 

The  critical  chi-square  is  defined  as  >d(a-i)  , which  in  this  example 
(from  Table  A-3)  is 


Since  xs2  < x2o5(3)  we  do  not  reject  H0. 

Conclusion: 

The  data  from  stations  A,  B,  C and  D cannot  be  considered  to  have 
equal  variances  at  the  5%  level  of  significance. 

It  should  be  noted  that  the  equation  for  xs2  is  biased  slightly 
upward.  If  xs2  is  nonsignificant,  the  bias  is  not  important.  However,  if 
the  computed  xs2  is  just  a little  above  the  threshold  value  for 
significance,  a correction  (C)  for  the  bias  should  be  applied  as  follows: 


where  Xs2  is  the  corrected  xs2  for  the  bias  and 


3(a  - 1) 

The  SAS(BMDP)  examples  described  by  Ingwersen  (1981a)  show  how  to 
obtain  the  Bartlett  chi-square  test  statistic  of  equal  variances  using  SAS. 


Xs2  = 0.27598 


XU  = 7-815 
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9.0  Comparison  of  Two  Population  Means  or  a Population  Mean  and  a 
Constant 

In  many  instances,  we  need  to  determine  if  a population  mean  is 
significantly  different  from  an  established  value,  such  as  a fixed  water 
quality  standard,  or  whether  two  populations  are  significantly  different  in 
an  above-versus-below  or  treatment-versus-control  study.  In  this  section, 
three  distinct  cases  are  reviewed:  (1)  comparison  of  the  population  mean 
and  a constant  when  the  variance  is  unknown,  (2)  comparison  of  the  means 
from  two  populations  when  the  variance  is  unknown  and  the  data  are 
unpaired  and  (3)  comparison  of  the  means  from  two  populations  when  the 
variance  is  unknown  and  the  data  are  paired.  In  each  case,  a brief 
description  of  the  utility  of  the  technique,  the  procedure  used  in  testing 
the  hypothesis  and  a water  quality  example  are  presented. 

Case  I . Comparison  of  the  population  mean  and  a constant  when  the 
variance  is  unknown. 

This  particular  case  should  be  applied  in  situations  where  we  wish  to 
compare  the  mean  value  of  a water  quality  constituent  at  a given  sampling 
site  with  a specified  constant,  such  as  a nonvariable  water  quality 
standard.  It  is  assumed  that  the  data  were  collected  in  a random  manner 
from  a normally  distributed  population.  The  procedure  for  testing  the 
hypothesis  is  as  follows: 

1.  Establish  H0  and  Ha. 

If  we  want  to  test  that  the  population  mean  (p)  is  equal  to  a 
specified  constant  (c),  then  H0  and  Ha  are: 

Ho*  M — ^ 

Ha:  =£  c. 

However,  if  we  want  to  test  that  the  population  mean  is  not 
greater  than  the  specified  constant,  then  H0  and  Ha  are: 

H0:  m < c 
Ha:  n > c. 
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2. 


Test  the  data  for  normality. 

3.  Select  the  significance  level  (a). 

4.  Compute  the  test  statistic.  ts  = . 

Sx 

5.  Define  the  critical  t value,  tc.  In  the  instance  where  we  are 

testing  H0:  y = c,  the  test  is  two-tailed  while  for  the 

situation  where  H0:  y£  c the  test  is  one  tailed. 

6.  Reject  or  accept  the  null  hypothesis. 

An  illustration  of  Case  I is  presented  in  Example  8a. 
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EXAMPLE  8a 

Comparison  of  the  Population  Mean  and  a Constant 
When  the  Variance  is  Unknown 


Problem: 

Twenty  suspended  solids  samples  were  collected  at  a sampling  station 
over  three  months.  The  concentrations  of  SS  were  determined  and  are  as 
follows:  16,  29,  75,  7,  15,  24,  26,  12,  10,  23,  49,  22,  13,  14,  18,  17, 

19,  31,  27  and  14.  Is  there  reason  to  believe,  at  the  5%  level  of 
significance,  that  the  mean  SS  concentration  is  significantly  different 
from  28  mg/1? 

Solution: 

In  general,  SS  data  is  strongly  skewed.  To  test  the  assumption  of 
normality,  the  data  was  subjected  to  the  Shapiro-Wilk  test  using  the 
UNIVARIATE  procedure  from  SAS  (1979).  The  result  was  a PROB  < W = 0.01. 
Consequently,  the  data  was  transformed  using  Log  (X)  and  retested  for 
normality.  The  result  of  the  log  (X)  transformation  was  a PROB  < W = 0.90. 
This  was  much  better,  but  there  is  still  a 10%  chance  of  Type  I error  if  we 
accept  the  sample  as  being  from  a normal  population.  However,  since  the 
sample  is  small,  we  decide  to  accept  it  coming  from  a normally  distributed 
population  and  proceed  with  the  problem  at  hand. 

1.  Establish  H0  and  Ha. 


H0:  = log  (28) 

Ha:  n * log  (28) 

2.  Select  the  significance  level. 

From  the  problem  statement,  a = 0.05. 

3.  Compute  the  test  statistic,  ts. 


. _ 1.2962-1.4472  _ 0 
ts  “ 0.2348/^20  “ 2-8760 

4.  Define  the  critical  region. 

Using  Table  A-2  we  find: 

t < t.025(19)i  therefore  t < 2.093 
and 

t > t. 975(19);  therefore  t > 2.093 
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5.  Reject  or  accept  H0. 


Since  ts  lies  to  the  left  of  -2.093  we  do  not  accept  the  null 
hypothesis. 

6.  Conclusion. 

At  the  5%  level  of  significance,  the  mean  SS  concentration  is 
significantly  different  from  28  mg/1. 

This  example  problem  can  also  be  solved  using  either  the  MEANS  (T,PRT) 
procedure  from  SAS  (1979)  or  the  T-TEST  (PAIRS=)  subprogram  from  SPSS 
(1975).  The  SPSS  output  for  this  problem  is  presented  in  Table  15.  It 
should  be  noted  that  the  mean  values  are  in  terms  of  log  (X). 
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Case  II:  Comparison  of  the  means  from  two  populations  when  the 

variance  is  unknown  and  the  data  are  unpaired. 

In  some  water  quality  studies  we  want  to  know  if  two  population  means 
are  statistically  different..  An  example  might  be  assessing  the  impact  of  a 
road  when  sampling  above  and  below  was  performed  or  evaluating  the  effect 
of  a treated  watershed  against  a control  watershed.  In  addition  to  the 
assumptions  for  Case  I,  it  is  assumed  that  the  variances  of  both 
populations  are  equal,  although  unknown,  and  that  the  data  are  not  paired. 
The  procedure  for  testing  the  hypothesis  is  as  follows: 

1 .  Establ i sh  Hn  and  EL . 

0 a H0:  m = i*2 

Ha:  ih  * 


2.  Test  the  data  for  normal ilty  and  homogeneity  among  variances. 

3.  Select  the  significance  level  (a). 

4.  Compute  the  test  statistic,  ts,  to  test  the  hypothesis  where 


ts  ~ d ~ Xl  X2  ’ Sd"  “ V ^ ( n,  n2  2 


and  sw  which  is  the  weighted  variance,  is  determined  by 


«.  2 _ s,2  01,-1)  + s22  (n2  - 1 ) 

ow  — 


n,  + n2-2 

Determine  the  critical  t value,  tc.  This  is  a two-tailed  test. 


to 


±t 


1/2a  (n,  + n2- 


2) 


6.  Reject  or  accept  the  hypothesis. 

An  example  of  Case  II  is  given  in  Example  8b. 
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EXAMPLE  8b 

Comparison  of  the  Means  from  Two  Populations  When 
the  Population  Variance  is  Unknown  and  the  Data  Unpaired 


Problem: 

Water  temperature  (°F)  was  measured  at  the  mouths  of  two  different 
tributaries,  A and  B,  to  Whitefish  Creek  during  the  summer  of  1980.  The 
data  obtained  are  summarized  below. 


Station  A: 

66, 

59, 

74, 

60, 

62, 

69, 

78, 

71, 

52, 

78, 

44, 

50, 

64 

Station  B: 

61, 

69, 

67, 

63, 

39, 

80, 

63, 

78, 

47, 

67, 

72, 

80, 

41 

74, 

65, 

67, 

62 

Is  there  a significant  difference,  a=  0.05,  between  the  population  means 
of  Stations  A and  B?  It  can  be  assumed  that  the  populations  are  normally 
distributed  and  have  equal  variances. 

Solution: 

1.  Establish  H0  and  Ha. 

H0:  th  = M2 

Ha:  Mi  * M2 

2.  Select  the  significance  level. 

From  the  problem  statement  we  know  a = 0.05. 

3.  Compute  the  test  statistic,  ts. 

First,  compute  X and  s for  Stations  A and  B. 

XA  = 63.62 
sA  = 10.62 
XB  = 63.85 
se  = 11.54 

Now,  determine  ts  as  follows: 


d = XA-XB 

d = 63.62-63.85  = -0.23 


sa 


s 


2 


w 


sA2(nA-  1)  + sB2(nB  - 1) 
nA  + nB- 2 
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(10.62)2  (12)  + (1 1.54)2  (19) 
31 


sw2  = 125.3 


13  + 20 
13  x 20 


t _ -0.23 
s 3.988 


ts  = -0.058 


4.  Define  the  critical  regions. 

The  critical  regions  for  this  example  are  defined  by: 
tc  < t. 025(31) 


and 


tc  > t (1  - 0.025) (31 ) 

Therefore,  the  rejection  regions  are: 

t < -2.042 
and 

t > 2.042 

5.  Reject  or  accept  the  null  hypothesis. 

Since  ts  lies  between  the  t values  of  -2.042  and  2.042  we  do  not 
reject  the  null  hypothesis. 

Conclusion:  There  is  no  difference  between  the  mean  water  temperatures  at 

the  mouths  of  streams  A and  B at  the  5%  level  of  significance. 

This  example  problem  could  also  be  solved  using  the  T-TEST  program 
from  SAS  (1979)  or  the  T-TEST  (GR0UPS=)  subprogram  from  SPSS  (1975).  The 
SPSS  output  for  the  solution  of  example  problem  is  presented  in  Table  16. 


65 


COMPARISON  ok  TWO  m£ ANS  04/14/00 


Case  III:  Comparisons  of  the  means  from  two  populations  when  the 

variance  is  unknown  and  the  data  are  paired. 

In  some  instances,  extraneous  factors  which  have  no  direct  relation  to 
the  effect  we  are  attempting  to  measure  can  cause  significant  difference 
between  the  means.  We  can  overcome  this  problem  somewhat  by  designing  our 
sampling  program  so  that  observations  are  collected  in  pairs,  i.e.  when  a 
sample  is  collected  at  Station  A,  one  is  also  collected  at  Station  B.  In 
this  type  of  a study,  we  are  striving  to  obtain  a pair  of  observations  that 
are  alike  in  all  respects  except  the  one  we  are  trying  to  measure.  This 
method  has  the  same  underlying  assumptions  as  Case  II,  with  the  exception 
of  equal  variances  for  each  population.  In  other  words,  with  this  method 
we  need  not  assume  the  two  population  variances  are  equal.  The  procedure 
for  testing  the  hypothesis  is  as  follows: 

1.  Establish  H0  and  Ha. 

H0:  mi  = M2 
Ha:  mi  * M2 

2.  Test  the  data  for  normality. 

3.  Select  the  significance  level  (a). 

4.  Use  the  statistic,  ts,  to  test  the  hypothesis: 

t = — - — 
s Sa/Vn 

where 

d = X,  -X2 

and  s-j  is  the  standard  deviation  of  the  differences, 
d 

5.  Determine  the  critical  t value,  tc.  This  is  a two  tailed 
test. 

tc  = ± t(a/2|(n  -1) 
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6.  Compute  the  value  of  the  test  statistic,  ts,  and  accept  or 
reject  the  hypothesis. 

An  example  of  Case  III  is  presented  in  Example  8c. 
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EXAMPLE  8c 

Comparison  of  the  Means  from  Two  Populations  When 
the  Population  Variance  is  Unknown  and  the  Data  Paired 


Problem: 

Water  samples  were  collected  in  pairs  above  (A)  and  below  (B)  a 
clearcut  in  western  Oregon.  The  samples  were  analyzed  for  dissolved  oxygen 
concentration  in  the  field.  The  results  are  tabulated  below.  Determine  if 
there  is  a significant  difference  (a=  0.05)  in  the  mean  dissolved  oxygen 
concentration  above  and  below  the  clearcut.  It  can  be  assumed  that  both 
populations  are  normally  distributed. 


Dissolved  Oxygen 

Concentration  (mg/1) 

Difference 

Station  A 

Station  B 

d = A - B 

6.2 

5.2 

1.0 

6.5 

5.4 

1.1 

6.8 

5.3 

1.5 

7.0 

5.7 

1.3 

6.9 

5.6 

1.3 

7.0 

6.2 

0.8 

6.8 

5.7 

1.1 

6.7 

5.6 

1.1 

6.8 

5.8 

1.0 

6.2 

5.6 

0.6 

d 

= 10.08 

d 

= 1.08 

Sd 

= 0.257 

Sol ution: 

1.  Establish  H0  and  Ha. 

H0.  A — MB 
Ha:  Ma  * Mb 

2.  Select  the  significance  levels. 

From  the  problem  statement  we  know  a = 0.05. 

3.  Compute  the  test  statistic,  ts. 


_ 1-008 
Ts  " 0.257/V10 

ts  = 12.40 
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4. 


Define  the  critical  regions 

tc  < t. 025(9) 
and 

tc  > t. 975(9) 

Therefore,  the  rejection  regions  are: 

t < -2.262 
and 

t > 2.262 

5.  Reject  or  accept  the  null  hypothesis. 

Since  ts  is  greater  than  2.262,  the  null  hypothesis  is  rejected. 

Concl usion:  The  dissolved  oxygen  concentration  is  significantly  different 

above  and  below  the  clearcut  at  the  5%  level. 

This  example  problem  can  also  be  solved  using  the  MEANS(T,PRT) 
procedure  from  SAS  (1979)  or  the  T-TEST  (PAIRS=)  subprogram  from  SPSS 
(1975).  The  SPSS  output  for  the  solution  of  this  example  is  presented  in 
Table  17. 
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10.0  Analysis  of  Variance 


10.1  I ntroducti on 

Analysis  of  variance,  commonly  abbreviated  "anova,"  is  a statistical 
method  with  which  we  can  test  whether  two  or  more  sample  means  are 
significantly  different.  This  section  begins  with  a discussion  of  the 
assumptions  underlying  the  anova  tests.  Emphasis  has  been  placed  on  what 
these  assumptions  mean  and  how  violations  of  them  affect  the  validity  of 
the  probability  statements  resulting  from  the  anova  tests.  This  is 
followed  by  a discussion,  with  examples,  of  several  of  the  more  commonly 
used  anova  tests:  the  one-way  anova;  the  two-level  nested  anova;  and  the 

two-way  anova.  In  addition,  methods  for  evaluating  the  difference  between 
means  is  also  covered. 

10.2  Assumptions  Underlying  Analysis  of  Variance 

There  are  several  assumptions  underlying  the  anova  tests.  Violations 
of  one  or  more  of  these  assumptions  can  affect  both  the  level  of 
significance  and  sensitivity  of  the  test, 
a.  Random  Sampl i ng 

It  is  assumed  that  the  observations  were  obtained  by  random  sampling. 
If  our  sampling  procedure  is  biased,  such  as  always  collecting  samples  from 
riffles  when  we  are  interested  in  the  substrate  composition  of  an  entire 
stream  bed,  we  are  likely  to  have  problems  meeting  the  other  assumptions 
that  follow  here.  If  the  other  assumptions  hold,  the  lack  of  random 
sampling  will  probably  have  little  effect  on  the  level  of  significance  and 
sensitivity  of  the  anova  test.  However,  the  chances  of  the  other 
assumptions  being  valid  when  random  sampling  was  not  used  are  not  very 
good.  Consequently,  as  a precautionary  measure,  we  should  consider  the 
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concept  of  random  sampling  very  carefully  when  we  design  our  monitoring 
program. 

b.  Independence 

It  is  assumed  that  the  error  terms  for  each  variate  are  independently 
and  identically  distributed.  If  the  error  terms  are  not  independent,  the 
validity  of  the  usual  F-test  of  significance  can  be  seriously  impaired 
(Sokal  and  Rohlf,  1969).  In  general,  if  the  samples  were  collected  in  a 
random  manner,  there  should  be  no  problem  with  independence  of  the  error 
terms.  If  lack  of  independence  is  suspected,  such  as  when  evaluating 
spatial  distribution  of  macroi nvertebrates  in  a stream  channel,  it  can  be 
tested  using  the  "Runs"  test  (Sokal  and  Rohlf,  1969).  If  the  errors  are 
not  independent,  there  is  no  simple  way  to  correct  the  problem  short  of 
redesigning  the  study. 

c.  Homogeneity  of  Variances 

The  assumption  of  homogeneity  of  variances  is  very  important  in  the 
anova  test  because  each  sample  variance  is  considered  to  be  an  estimate  of 
the  same  parametric  error  variance.  You  are  likely  to  encounter  the 
problem  of  nonhomogeneity  of  variances  when  the  means  of  one  or  two  groups 
are  much  larger  than  the  others,  such  as  mean  SO4  concentration 
in  a lake  draining  an  area  underlain  by  granitics  as  opposed  to  one 
draining  an  area  high  in  gypsum. 

Two  methods  you  can  use  to  test  for  homogeneity  of  variances  between 
two  sample  means  have  al ready  been  discussed  (Section  8.0).  In  addition, 
Bartlett's  test  can  also  be  used  in  situations  where  more  than  two  means 
are  involved.  For  a quick  "first  inspection"  of  this  assumption,  you  can 
simply  check  the  correlation  between  the  means  and  variances  of  the 
samples.  If  the  variances  increase  with  the  means,  the  coefficient  of 
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variation  [ s ( 1 00 ) /X ] will  be  approximately  constant  for  each  sample. 

If  the  means  and  variances  are  independent,  however,  this  ratio  will  vary 
wi  dely . 

If  moderate  heterogeneity  of  variances  exists,  the  effect  on  the 
overall  test  of  significance  is  probably  not  too  serious.  However,  if  this 
condition  exists  and  you  are  making  comparisons  with  one  degree  of  freedom, 
you  can  expect  serious  problems, 
d.  Additivity 

It  is  assumed  that  the  treatment  (group)  and  environmental  effects  are 
additive.  The  assumption  of  additivity  can  be  evaluated  using  Turkey's 
test  (Sokal  and  Rohlf,  1969).  In  general,  if  this  assumption  is  not  met  by 
the  data  it  can  be  corrected  with  a data  transformation. 

Consider  the  data  presented  in  Table  18.  Here  we  have  two  fish  tanks, 
A and  B,  in  which  the  concentration  of  a heavy  metal  required  to  kill  50 
percent  of  the  test  fish  at  two  different  pH  levels  was  determined.  Prior 
to  the  log  (X)  transformation,  the  effects  were  multiplicative  (3  x 10  = 30 
and  3 x 20  = 60).  However,  after  the  transformation,  the  effects  became 
additive  (1.00  + 0.48  = 1.48  and  1.30  + 0.48  = 1.78). 

Table  18.  96-hour  LD50  concentration  (mg/1)  for  test  fish. 


Untransformed 

Log  transformed 

pH 

pH 

4.0 

6.0 

4.0 

6.0 

Tank 

A 

10 

20 

1.00 

1.30 

( 3 x ) 

(3x) 

(+0.48) 

(+0.48) 

B 

30 

60 

1.48 

1.78 
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10.3  Nonparametric  Methods  of  Analysis  of  Variance 


If  we  cannot  transform  our  data  to  meet  the  assumptions  of  the  analy- 
sis of  variance,  we  may  have  to  resort  to  a similar  nonparametric  method. 
These  methods  are  not  concerned  with  specific  parameters,  but  only  with  the 
distribution  of  the  variates.  Although  not  discussed  here,  a detailed 
description  of  many  of  these  methods  is  covered  by  Sokal  and  Rohlf  (1969). 

10.4  One-Way  Classification  Analysis  of  Variance 

The  most  basic  anova  test  is  the  one-way  anova.  Higher  order  anova 
tests  are  merely  extensions  of  the  one-way  anova.  The  one-way  anova  has 
only  one  criterion  for  classification,  such  as  sampling  stations,  and,  in 
its  simplest  form,  has  an  equal  number  of  data  observations  in  each  group. 

The  conceptual  framework  of  the  one-way  classification  anova  with 
equal  sample  sizes  is  straightforward  and  easy  to  follow.  Consider  "a" 
sampling  stations  ("a"  groups)  along  a stream.  Suppose  we  have  measured 
specific  conductance  "n"  times  ("n"  observations)  at  each  station.  Each 
specific  conductance  observation  can  be  denoted  by  X 1 j which  is  the  jth 
observation  of  the  ith  station.  The  data  may  be  arranged  as  in  Table  19. 

Table  19.  Data  arranged  for  a one-way  classification  anova. 

"a"  groups 


1 

2 

3 ... 

i ... 

a 

1 

Xll 

X21 

X31  . . . 

xil  • • • 

xal 

2 

x12 

x22 

x32  • • • 

xi2  • • • 

xa2 

3 

x13 

x23 

x33  • • • 

xi3  • • • 

xa3 

j 

xu 

X2j 

X3j  • • • 

xij  • * * 

XaJ 

n 

xln 

x2n 

x3n  • • • 

xin  • • • 

xan 
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The  null  hypothesis  that  we  wish  to  test  can  be  stated  as  follows 


Ho!  Ml  = M2  = Ms  = ■ ■ ■ = Mi  = • • • = Ma 


The  alternative  hypothesis,  therefore,  is 

Ha:  Ml  ^ M2  ^ M3  ^ ^ Mi  ^ ^ Ma 


The  test  for  differences  in  means  is  based  on  the  fact  that  if  the  means  of 
each  group  are  greatly  different,  the  variance  of  the  combined  groups  is 
much  larger  than  the  variances  of  the  separate  groups.  Consequently,  to 
test  for  differences  in  means  we  test  for  differences  in  variances.  With 
the  one-way  anova  we  obtain  two  independent  estimates  of  the  population 
variance;  one  is  based  on  the  variance  withi n groups  while  the  other  is 
based  on  the  variances  among  groups. 

To  test  the  null  hypothesis  stated  above,  the  following  calculations 
are  required.  The  correction  term,  C,  is  determined  by  Equation  13. 


The  total  sum  of  squares  (SS+otal)*  adjusted  for  the  mean,  is  found  using 
Equation  14.  The  sum  of  squares  attributable  to  groups  is  commonly 


called  the  between  groups  sum  of  squares  (SSgr0UpS)  or  groups  sum  of 
squares  and  calculated  using  Equation  15.  The  sum  of  squares  within  a 


group  is  referred  to  as  the  within  group  sum  of  squares  ( SSW1- thin),  as 
well  as  residual  sum  of  squares  and/or  error  sum  of  squares,  and  is 
generally  found  by  subtracting  the  between  groups  sum  of  squares  from  the 
total  sum  of  squares.  Equation  16.  This  can  be  done  because  the  sums  of 
squares  are  additive. 


(13) 


=EEX'-C 


(14) 


(15) 


ss, 


'within 


— SStotal  SS, 


’total 


'groups 


(16) 


76 


/ 


The  results  of  a one-way  classification  anova  with  equal  sample  sizes 
are  generally  presented  in  an  anova  table  similar  to  Table  20.  Table  20  is 
divided  into  five  columns.  Column  (1)  identifies  the  source  of  variation 
as  among  groups,  within  groups  and  total.  Column  (2)  gives  the  degrees  of 
freedom  by  which  the  various  sums  of  squares  must  be  divided  in  order  to 
yield  estimates  of  the  variances.  Column  (3)  lists  the  sums  of  squares 
(SS)  for  the  respective  sources  of  variation.  Column  (4)  contains  the 
mean  square  (MS).  The  mean  square  is  obtained  by  dividing  the  sum  of 
squares  by  the  degrees  of  freedom.  The  two  mean  squares  obtained  are  the 
two  estimates  of  the  variance  discussed  earlier.  Column  5 presents  the 
calculated  F statistic,  Fs.  It  is  defined  as  the  ratio  of  the  two 
independent  estimates  of  the  same  population  variance.  The  mean  square 
obtained  from  the  means  (between  groups)  is  always  placed  in  the  numerator 
since  we  wish  to  state  that  the  means  are  significantly  different  only  if 
they  are  significantly  more  spread  out  than  would  be  expected  for  samples 
from  the  same  population. 

Two  examples  of  the  one-way  classification  anova  with  equal  sample 
size  are  presented  in  Examples  9a  and  9b. 

Table  20.  A typical  ANOVA  table  for  the  one-way  classification  with  equal 
sample  sizes. 


(1) 

Source  of 
Variation 

(2) 

Degrees  of 
Freedom 

(3) 

SS 

(4) 

MS 

(5) 

Fs 

among  groups 

a-1 

SSgroups 

SSqrouDS 

(a-i) 

^.gromis 
M^wi thin 

within  groups 
total 

a(n-l) 

an-1 

SSwi thi n 
SStotal 

SSwi thin 
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EXAMPLE  9a 

One-Way  Classification  ANOVA  with  Equal  Sample  Size 


Problem: 

Nitrate  concentrations  were  monitored  at  four  stations  within  a 
watershed  over  a one-year  period  (see  illustration  below).  The  results 
have  been  tabulated  below  the  watershed  illustration.  Determine  if  there 
is  a significant  difference,  at  the  5%  level,  in  the  mean  NO3  concen- 
tration between  stations. 
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NITRATE  CONCENTRATIONS  (mg/1) 


STATIONS 


1 

2 

3 

4 

8.7 

9.8 

11.9 

8.8 

8.0 

8.6 

15.1 

7.9 

8.9 

8.8 

11.2 

8.5 

8.0 

8.3 

11.9 

8.1 

6.8 

7.4 

13.9 

10.0 

6.4 

9.4 

13.7 

7.6 

7.8 

7.9 

12.6 

10.1 

8.4 

8.9 

16.3 

9.2 

7.8 

8.3 

15.4 

10.0 

7.7 

11.1 

14.4 

8.5 

8.3 

8.9 

13.2 

12.7 

8.3 

8.2 

11.8 

9.6 

9.7 

10.7 

12.6 

8.5 

6.9 

7.2 

12.1 

10.2 

7.4 

7.2 

13.3 

6.6 

IX  119.10 

130.70 

199.40 

136.30 

l 7.94 

8.71 

13.29 

9.09 

s 0.85 

1.16 

1.50 

1.44 

Sol ution: 

1.  Establish  the  hypothesis  to  be  tested. 

Ho'-  Vh  = V-2  - ^3  = M4 
Ha'-  /Ai  ^ M2  ^ ^3  ^ ^4 

2.  Selected  the  significance  level. 

From  the  problem  statement,  a = 0.05. 

3.  Develop  the  anova  table  and  determine  the  test  statistic,  Fs. 

The  data  were  analyzed  using  the  ONEWAY  subprogram  from  SPSS 
(1975).  It  should  be  noted  that  either  the  ANOVA  procedure  from 
SAS  (1979)  or  the  ANOVA  subprogram  from  SPSS  (1975)  could  have 
been  used  to  solve  for  the  test  statistic.  The  results  of  the 
analysis  are  presented  in  Table  21.  From  Table  21  it  can  be  seen 
that  Fs  = 54.133. 
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4.  Define  the  critical  region,  Fc. 

From  Table  A-4  we  find  that 

Fc  = F. 05(3. 56)  - 2.76 

5.  Reject  or  accept  the  null  hypothesis. 

Since  Fs  > Fc,  we  do  not  accept  the  null  hypothesis.  This 
means  that  the  mean  NO3  concentrations  at  the  four 
stations  cannot  be  considered  equal  (i.e.  having  come  from  the 
same  population)  with  our  a = 0.05. 


It  should  be  noted  that  at  this  point  we  have  no  statistical  insight 
as  to  which  mean  or  means  differ  from  each  other.  All  we  know  is  that  they 
are  not  al 1 equal . 
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EXAMPLE  9b 

One-Way  Classification  ANOVA  with  Equal  Sample  Size 
and  Data  Transformed  by  Log  (X) 


Problem: 

Stonefly  Nymphs  were  collected  at  three  stations  along  a stream  (see 
illustration  below).  The  sample  results  are  tabulated  below.  Determine  if 
the  mean  counts  are  significantly  different  (P  = 0.05)  between  sampling 
stations. 


RESULTS  OF  STONEFLY  NYMPH  SAMPLES 


STATIONS 


1 


2 


3 


Counts 

log  (counts) 

Counts 

log  (Counts) 

Counts 

log  (Counts) 

91 

1.96 

120 

2.08 

8 

0.90 

77 

1.89 

110 

2.04 

17 

1.23 

86 

1.93 

93 

1.97 

20 

1.30 

52 

1.72 

150 

2.18 

15 

1.18 

80 

1.90 

82 

1.91 

10 

1.00 

EX  386 

9.40 

555 

10.18 

70 

5.61 

X 77.2 

1.88 

111.00 

2.04 

14.00 

1.12 

s 15.09 

0.09 

26.31 

0.10 

4.95 

0.17 
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Sol ution: 


A common  problem  with  count  data,  such  as  that  presented  here,  is  that 
the  variance  of  the  sample  is  not  independent  of  the  mean.  Generally,  this 
can  be  corrected  with  a log  (X)  or  log  (X  + 1)  transformation.  This 
transformation  does  not  affect  the  validity  of  the  anova  and,  in  fact  in 
this  case,  makes  the  results  more  reliable.  The  calculations  are  carried 
out  in  the  same  manner  as  with  the  nontransformed  data. 

The  solution  procedure  is  as  follows. 

1.  Establish  the  hypothesis  to  be  tested. 

H0:  im  = i*2  = ^3 
Ha:  mi  * to  * ^3 


2.  Select  the  significance  level. 

From  the  problem  statement,  a=  0.05. 

3.  Develop  the  anova  table  and  determine  the  test  statistic,  Fs. 

The  data  were  analyzed  using  the  same  procedures  outlined  in  step 
3 of  the  solution  of  Example  9a.  However,  in  this  case,  the 
log-transformed  data  were  input  into  the  program  rather  than  the 
raw  data.  The  results  of  the  analysis  are  presented  in  Table 
22.  From  Table  22,  it  can  be  seen  that  Fs  = 75.97. 

4.  Define  the  critical  region,  Fc. 

From  Table  A-4  we  find  that 

Fc  = F. 05(2, 12)  = 3*88 

5.  Reject  or  accept  the  null  hypothesis. 

Since  Fs  > Fc,  we  do  not  accept  the  null  hypothesis.  This  means 
that  the  mean  Stonefly  Nymph  counts  at  the  three  stations  are  not  equal  for 
a = 0.05. 
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In  the  case  when  groups  are  not  composed  of  samples  of  equal  size,  the 
required  computations  for  the  one-way  classification  anova  are  as  follows. 
Correction  term,  C. 


2 / a 


Total  sum  of  squares  adjusted  for  the  mean,  SStotal 


(17) 


SStotal  — EE 

•=i  )=i 


X„  -C 


(18) 


Between  groups  sum  of  squares,  SSgr0UpS. 


Within  groups  sum  of  squares,  SSW1* thi n * 


ss. 


— SStotal  - SSn 


(19) 

(20) 


The  results  of  a one-way  anova  with  unequal  sample  sizes  are  generally 
presented  in  an  anova  table  similar  to  Table  23. 


Table  23.  A typical  ANOVA  table  for  the  one-way  classification  with 
unequal  sample  size. 


Source  of 
Variation 

Degrees  of 
Freedom 

SS 

MS 

FS 

among  groups 

a-1 

SSgroups 

SSgroups 

MS 

groups 

MS  . . . 
Dwi  thi  n 

within  groups 

n-a 

SSwi thin 

SSwithin 
~n^a 

total 

n-1 

SStotal 

An  example  of 

the  one-way 

classification  anova 

with  unequal 

sample  sizes 

within  groups  is  presented  in  Example  9c. 
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EXAMPLE  9c 

One-Way  Classification  ANOVA  with  Unequal  Sample  Size 


Problem: 

Suspended  solids  concentration  was  monitored  at  three  stations  within 
a watershed  (see  illustration).  The  results  are  tabulated  below  the 
illustration.  Determine  if  there  is  a significant  difference  (P  = 0.10)  in 
the  mean  SS  concentration  between  stations. 


LEGEND: 

• Sampling  site 
CO  Sampling  station  number 

Treatment  boundary 

■ Watershed  boundary 


\ 
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STATION 


1 


2 


3 


mg/1  log  (X)  mg/1 


27 

1.43 

49 

29 

1.46 

25 

24 

1.38 

13 

48 

1.68 

29 

69 

1.84 

46 

30 

1.48 

15 

21 

1.32 

29 

68 

1.83 

23 

21 

1.32 

15 

20 

1.30 

28 

30 

1.48 

EX 

272 

74 

1.87 

X 

27 

26 

1.41 

s 

12, 

EX 

487 

19.80 

X 

37.5 

1.52 

s 

20.1 

0.21 

Solution: 


log  (X) 

mg/1 

log  (X) 

1.69 

17 

1.23 

1.40 

30 

1.48 

1.11 

25 

1.40 

1.46 

18 

1.26 

1.66 

23 

1.36 

1.18 

18 

1.26 

1.46 

17 

1.23 

1.36 

10 

1.00 

1.18 

20 

1.30 

1.45 

21 

1.32 

13.95 

37 

1.57 

1.40  EX  ~236 

14.40 

0.20 

X 21.4 

1.31 

s 7.3 

0.15 

It  is  obvious  that  initially  the  sample  variances  and  means  do  not 
vary  independently.  However,  after  the  log  (X)  transformation,  this 
problem  was  corrected. 

1.  Establish  the  hypothesis  to  be  tested. 

H0:  ih  = th  = Ms 

Ha:  at  least  one  mean  is  not  equal  to  the  other  means 


2.  Select  the  significance  level. 

From  the  problem  statement,  a=  0.10. 

3.  Develop  the  anova  table  and  determine  the  test  statistic,  Fs. 

The  log-transformed  data  were  analyzed  using  the  ANOVA  procedure 
from  SAS  (1979).  It  should  be  noted  that  either  the  ONEWAY  or 
ANOVA  subprograms  from  SPSS  (1975)  could  have  been  used  to  solve 
for  the  test  statistic.  The  results  of  the  analysis  are 
presented  in  Table  24  where  it  can  be  seen  that  Fs  = 3.95. 
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4.  Define  the  critical  region,  Fc. 

From  Table  A-4  we  find  that 

Fc  = F. 05(2, 31)  = 2-49 

5.  Accept  or  reject  the  null  hypothesis. 

Since  Fs  > Fc  we  do  not  accept  the  null  hypothesis. 
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Table  24.  Results  of  the  analysis  of  variance  for  Example  9c  using  the  ANOVA  procedure  from  SAS  (1979). 


10.5  Evaluating  the  Difference  Between  Means  When  the 
Sample  Sizes  are  Equal 

If  we  reject  the  null  hypothesis  after  performing  the  anova  test,  all 
we  know  is  that  there  is  a significant  difference  (at  the  a level  selected) 
between  treatment  means.  We  have  no  idea  which  group  (or  groups)  is 
responsible  for  our  rejection  of  the  null  hypothesis.  In  some  cases  we  may 
want  to  isolate  the  group  (or  groups)  which  causes  us  to  reject  the  null 
hypothesis. 

A simple  method  of  evaluating  the  difference  between  means  has  been 
developed  by  Snedecor  (1956)  and  is  called  the  Q-test.  (Much  of  what 
follows  has  been  taken  from  Nash,  1965.)  The  procedure  is  outlined  using  a 
hypothetical  data  set.  Initially,  the  means  of  the  groups  are  ranked  in 
order  from  highest  to  lowest. 


Order  Group  Mean 


1 C 5.3 

2 A 4.8 

3 B 4.4 

4 D 4.0 


Next,  we  determine  the  difference,  D,  using  Equation  21,  for  each  value  of 

Q. 

D = QS‘=QVI  (21) 

where  s^  is  the  mean  square  of  the  error  or  within  groups  term,  n is  the 
number  of  observations  in  each  group  and  Q is  a factor  obtained  from  Table 
A-5  which  gives  the  upper  five  percent  of  the  range  for  different  degrees 
of  freedom  and  for  the  number  of  groups.  To  determine  which  group  (or 
groups)  is  causing  a significant  difference  between  means,  we  use  a 
different  value  of  Q depending  on  whether  the  group  means  are  2,  3,  . . ., 
or  n ranks  apart.  In  our  hypothetical  example,  assume  we  have  12  degrees 
of  freedom  associated  with  the  error  term  (SSw-jthin)>  a MSw-jthin  of 
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0.1923  and  four  observations  per  group.  Now,  we  can  determine  D for  each 


val ue  of  Q. 

sx  = = 0.2193 

a = 2 D = 3.08(0.2193)  = 0.6754 

a = 3 D = 3.77(0.2193)  = 0.8268 

a = 4 D = 4.20(0.2193)  = 0.9211 

Next,  we  rank  the  means  in  order  and  determine  the  differences  between  the 

highest  and  lowest.  For  simplicity,  and  to  demonstrate  the  differences 
between  means  and  the  comparisons  with  D,  the  values  of  D have  been 
inserted  in  parentheses.  If  the  difference  between  means  is  greater  than 
the  value  of  D for  a particular  difference,  the  difference  between  means  is 
significant  at  the  5%  level.  For  example,  X - 4.0  = 1.3  is  greater  than 
D = 0.9211;  therefore  the  difference  between  the  means  of  Group  C and  D is 
significant.  However,  the  difference  between  the  means  of  groups  A and  D 
is  not  significant,  and  so  on. 


Group 

Mean 

X - 4.0 

X - 4.4 

x - 4.8 

C 

5.3 

1.3 

0.9 

0.5 

(0.9211) 

(0.8268) 

(0.6754) 

A 

4.8 

0.8 

0.4 

- 

(0.8268) 

(0.6754) 

B 

4.4 

0.4 

- 

(0.6754) 

D 

4.0 

- 

An  example  of  the  Snedecor  Q-test  is  presented  in  Example  9d. 
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EXAMPLE  9d 

Evaluating  the  Differences  Between  Means  Using  Snedecor's  Q-test 


Problem: 

When  we  evaluated  the  NO3  data  presented  in  Example  9a,  it  was 
determined  that  there  was  a significant  difference,  at  the  IX  level,  in  the 
mean  NO3  concentrations  between  stations.  Now,  evaluate  the  differences 
between  the  means  and  determine  which  ones  differ  significantly  (P  = 0.05) 
using  Snedecor's  Q-test. 

Sol ution: 

1.  Rank  the  means  in  order  from  highest  to  lowest. 


Rank  Station  Mean 


1 3 13.29 

2 4 9.09 

3 2 8.71 

4 1 7.91 


2.  Determine  the  difference,  D,  using  Equation  21  for  each  value  of  Q. 
Q is  obtained  from  Table  A-5. 


D=  Qs* 

where 

„ _ 1.60 
S*-  15  “ 

0.3266 

a = 2 

D = 2.83(0.3266) 

a = 3 

D = 3.40(0.3266) 

a = 4 

D = 3.74(0.3266) 

Note: 

Since  Table  A-5  does  not 

list  56  df,  60  was  used  to  obtain  Q 

The  difference  in  Q between  40  and  60  df  is  very  minor. 

3.  Rank  the  means  in  order  and  determine  the  differences  between  the 
highest  and  lowest. 


Rank 

Station 

Mean 

X - 7.94 

X - 8.71 

X - 9.09 

1 

3 

13.29 

5.35 

(1.22)** 

4.58 

(1.11)** 

4.20 

(0.92)** 

2 

4 

9.09 

1.15 

(1.11)** 

0.38 

(0.92)** 

— 

3 

2 

8.71 

0.77 

(0.92) 

“ 

4 

1 

7.94 

- 

- 

- 
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4. 


Interpret  the  results. 


As  you  recall,  if  the  difference  between  means  is  greater  than 
the  value  of  D for  a particular  difference,  the  difference 
between  means  is  significant  at  the  5%  level.  The  results  of 
this  analysis  indicate  the  mean  at  Station  3 is  significantly 
different  from  the  means  at  all  the  other  stations  while  the  mean 
at  Station  4 is  significantly  different  from  the  means  at 
Stations  1 and  3.  In  other  words,  the  intensely  grazed  pasture 
is  yielding  an  average  NO3  concentration  significantly  greater 
than  any  of  the  other  treatments,  while  the  second  home 
development  is  significantly  different  from  the  pasture  and  one 
of  the  forest  stations  at  the  5%  level. 
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10.6  Evaluating  the  Difference  Between  Means  When  the  Sample  Sizes  are 
Not  Equal 

In  some  cases  you  may  want  to  evaluate  the  difference  between  means 
when  the  sample  sizes  are  not  equal.  Although  this  cannot  be  done  easily 
by  hand,  it  can  be  performed  very  readily  on  SAS  using  PROC  DUNCAN  or  on 
SPSS  using  the  Duncan  option  with  subprogram  ONEWAY,  both  of  which  are  the 
Duncan's  multiple  range  test.  These  methods  are  summarized  in  SAS  (1979) 
and  SPSS  (1975). 

10.7  Two-Level  Nested  Analysis  of  Variance 

In  studies  where  we  only  take  a single  water  quality  measurement  at  a 
station  per  visit,  we  can  never  be  certain  that  the  observed  differences  in 
water  quality  between  stations  are  due  sol ely  to  environmental  or  treatment 
factors  alone.  There  may  also  be  some  experimental  error  as  well.  The 
only  way  to  separate  these  two  effects  is  to  take  two  or  more  measurements 
(replicate  samples)  at  a station  per  visit.  If  we  do  not  find  any 
significant  differences  among  the  replicate  samples  at  a station,  we  can 
then  ascribe  the  differences  among  the  stations  to  environmental  or 
treatment  factors. 

In  the  two-level  nested  anova  each  group  is  subdivided  into  randomly 
chosen  subgroups.  Consider  the  situation  where  we  have  three  sampling 
stations,  1,  2,  and  3,  on  a stream  where  we  are  studying  dissolved  solids 
concentration.  Each  time  we  sampled  at  a station,  we  obtained  three 
replicate  water  samples  (these  samples  represent  our  randomly  chosen 
subgroups)  for  total  dissolved  solids  determination.  The  data  are 
symbolized  in  Table  25.  Each  dissolved  solids  determination  is  denoted  by 
Xjjk  where  i represents  the  group  or  station  (i  = 1,  2,  . . . , a) , j the 
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Table  25.  Data  arranged  for  a two-level  nested  anova. 


Station 

or  Groups 

(a=3) 

1 

2 

3 

Subgroups 

(b=4) 

1 

2 

3 

4 

1 

2 

3 

4 

1 

2 

3 

4 

Samples  1 

XU1 

X112 

X1 13 

X114 

X211 

Y 

212 

X2 13 

X2 14 

X311 

X312 

X3 13 

X314 

2 

X 12 1 

X122 

X 123 

X124 

X221 

X222 

X223 

CM 

CM 

X 

X321 

X322 

X323 

X324 

3 

X 13 1 

X132 

X 133 

X134 

X231 

X232 

X233 

X234 

X331 

X332 

X333 

X334 

4 

X141 

X142 

X143 

X144 

X241 

X242 

X243 

X244 

X341 

X342 

X343 

X344 

5 

X 15 1 

X152 

X 153 

X154 

X251 

X252 

X253 

X254 

X351 

X352 

X353 

X354 

sample  number  (j  = 1,  2,  . . . , n),  and  k the  randomly  chosen  subgroup 
(k  = 1,2  . . . , b). 

The  equations  required  for  the  two-level  nested  anova  with  equal  sample 
size  are  as  fol lows : 

Correction  term,  C. 

C = (£££x.,V/nba  (22) 

\i=1  k=1  j=1  / / 

Total  sum  of  squares,  SStotal* 


a b n 


SS  total  = 2EEX'ki'C 

i = 1 k = 1 J = 1 

Between  groups  sum  of  squares,  SSgr0UpS. 

a / b 

£ EEX 


SS, 


a / b n \2" 

ikj 

i=1  \ k= 1 j=1 


nb 


-C 


Subgroups  within  groups  sum  of  squares,  SSsubgr. 


(23) 


(24) 


£ ( £ £ x,« 


i = 1 \ k = 1 j = 


nb 


(25) 
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Within  groups  sum  of  squares , SS  wl- thin' 


SSw|thin  — SS  total  ' 


b / n \ 2" 

Mkj 


EE  E* 


i = 1 k = 1 \ j = 1 

n 


(26) 


The  anova  table  for  the  two-level  nested  anova  is  illustrated  in 
Table  26. 


Table  26.  The  anova 

table  for 

the  two-level 

nested  anova. 

Source  of 

Degrees  of 

Vari at  ion 

Freedom 

SS 

MS 

Fs 

among  groups 

a-1 

SSgroups 

SSqrouDS 

a-1 

F'Sqroups 

MSsubgr 

among  subgroups 
within  groups 

a(b-l) 

S^subgr 

SSsuhor 

aTb^ir 

MSsubqr 

^^within 

within  subgroups 

ab(n-l ) 

S$w-ithin 

SSwithi n 
ab(n-l ) 

total 

abn-1 

sstotal 

In  the  case  of  unequal  sample  sizes,  the  required  computations  for  the 
two-level  nested  analysis  of  variance  are  somewhat  different.  The 
equations  to  use  can  be  readily  found  in  any  good  statistics  book,  such  as 
Sokal  and  Rohlf  (1969). 

An  example  of  the  two-level  nested  anova  is  presented  in  Example  9e. 
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EXAMPLE  9e 

Two-Level  Nested  ANOVA 


Problem: 

Nitrate  concentration  was  monitored  at  two  stations  along  Trout  Creek. 
One  station  was  located  above  and  the  other  below  the  outfall  from  a sewage 
lagoon  servicing  a campground  (see  illustration  below).  The  objective  of 
the  study  was  to  determine  the  onsite  effect  of  the  outfall  on  the  nitrate 
concentration  of  Trout  Creek.  At  the  outset  of  the  study  there  was  some 
concern  over  the  precision  of  the  analytical  method  as  well  as  the  sample 
collection  procedures.  As  a result,  it  was  decided  that  the  samples  would 
be  collected  in  replicate  in  order  to  separate  the  treatment  effect  from 
the  experimental  error.  The  sample  results  are  presented  below.  Determine 
if  there  is  a significant  difference  (P  = 0.05)  between  stations  and  within 
stations. 


Replicate  sample  number 

Replicate  sample  number 

1 

2 

3 

l 

2 

3 

1.0 

1.1 

1.1 

5.1 

5.3 

5.1 

1.6 

1.5 

1.6 

6.0 

5.8 

6.1 

1.3 

1.3 

1.3 

5.8 

5.9 

5.9 

1.4 

1.3 

1.3 

6.5 

6.5 

6.4 

1.5 

1.5 

1.6 

6.7 

6.6 

6.8 

2.0 

1.8 

1.9 

6.1 

6.1 

6.1 

2.1 

2.0 

2.0 

6.9 

6.8 

6.9 

1.7 

1.8 

1.7 

5.5 

5.5 

5.6 

1.6 

1.6 

1.6 

5.4 

5.4 

5.4 
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Establish  the  hypotheses  to  be  tested. 


a.  Site  (stations). 

H0:  m a = mb 
Ha:  Ma  =£  Mb 


b.  Experimental  error. 

Ho-  Mi  = M2  = M3 
Ha-  Mi  ^ M2  ^ M3 


Select  the  level  of  significance. 


From  the  problem  statement,  a=  0.05. 


Develop  the  anova  table  and  determine  the  test  statistics,  Fs. 


The  data  were  analyzed  using  the  NESTED  procedure  from  SAS  (1979). 
The  results  of  the  analysis  are  presented  in  Table  27.  The  test 
statistic  for  the  stations  (site)  is 


Fs(site) 


266.66667  = 1202.25 
0.22181 


while  the  test  statistic  for  the  replications  (num)  is 

\ = 0.00370  = n 0? 
rs(num)  

0.22181 

Define  the  critical  regions,  Fc. 

From  Table  A-4  we  find  that 

a.  Site 

Fc(site)  = F. 05(1, 43)  = 4-04 

b.  Replications 

Fc (num)  = F. 05(4, 48)  = 2.56 
Accept  or  reject  the  null  hypothesis. 


Since  Fs(s-jte)  > ^(site)*  we  reject  the  null  hypothesis  that  the 
mean  NO3  concentrations  between  sites  are  equal.  However,  since 
Fs(num)  < Fc(num)>  we  do  not  reject  the  null  hypothesis  stating 
the  mean  NO3  concentrations  within  samples  are  equal. 
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Table  27.  Results  of  the  analysis  of  variance  for  Example  9e  using  the  NESTED  procedure  from  SAS  (1979). 


10.8  Two-Way  Classification  Analysis  of  Variance 


The  two-way  classification  anova  allows  us  to  evaluate  the  effects  of 
two  factors,  such  as  stations  and  seasons,  simultaneously.  It  is  assumed 
in  this  method  that  each  factor  contributes  to  the  water  quality  and  that 
the  two  factors  add  their  effects  without  influencing  each  other. 

Care  should  be  taken  in  the  design  of  your  data  analysis  so  that  you 
do  not  confuse  a two-level  nested  anova  with  a two-way  anova.  Consider 
Table  28.  Here  we  have  several  sampling  stations  for  which  we  have 
replicate  samples,  denoted  by  1 and  2.  Replicate  samples  1 and  2 are 
simply  arbitrary  designations  for  two  randomly  selected  samples  at  each 
station.  Replicate  sample  1 at  station  1 has  no  closer  relation  to  repli- 
cate sample  1 at  station  2 than  it  does  to  replicate  sample  2 station  1. 


Table  28.  Basic  design  of  the  two-level  nested  anova. 

Stations 


Replicate  samples 


2 

3 

• • • 

a 

1 

2 

1 

2 

1 

2 

1 

2 

Now,  consider  Table  29  an  example  of  a two-way  anova.  Here  we  have 
several  sampling  stations  for  which  we  have  collected  samples  during  two 
seasons,  spring  and  fall.  Why  can  we  not  rearrange  the  seasons  into  a 
nested  design?  The  reason  we  cannot  do  this  is  because  the  seasons  are 
common  to  all  sites  within  the  study.  If  we  nested  the  seasons  at  each 
station,  this  would  imply  that  the  two  seasons  per  station  were  random 
samples  from  all  possible  seasons  and  that  spring  at  Station  1 is  not  the 
same  as  spring  at  Station  2. 
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Table  29.  Basic  design  of  a two-way  anova. 

Station 


Spri ng 
Fall 


Sokal  and  Rohlf  (1969)  state  that  the  critical  question  to  be  asked 
is  always,  "Does  the  arrangement  of  the  data  into  a two-way  table  falsely 
imply  a correspondence  across  classes?"  If  it  does,  and  we  recognize  that 
the  factor  represents  only  random  subdivisions  of  the  groups  of  another 
factor,  then  we  have  a nested  anova.  If  there  is  correspondence  across 
groups,  the  two-way  design  is  appropriate. 

Although  there  are  many  different  designs  of  a two-way  anova,  the  one 
you  are  most  likely  to  need  is  the  two-way  anova  with  replicate  samples. 

The  basic  design  of  the  two-way  anova  with  replicate  samples  is  illustrated 
in  Table  30.  The  data  are  classified  two  ways,  by  station  (column)  and  by 
season  (row).  In  this  test  we  want  to  test  for  differences  in  the  means 
among  stations  and  among  seasons,  and  assess  the  interaction  of  the  two 
factors. 

The  equations  required  for  the  two-level  anova  with  replicate  samples 
are  as  follows. 

Correction  term,  C. 

a b n 
i = 1 k = 1 j = 1 

abn 

Total  sum  of  squares,  SStotal • 


ss„,.,= 


(28) 


Table  30.  Total  dissolved  solids  concentration  (mg/1)  at  Stations  001 
and  002  on  Eagle  Creek  during  the  spring  and  fall  of  1980. 


Station 

(c  = 2) 

Season  (r  = 2) 

001 

002 

340 

512 

Spri ng 

390 

560 

381 

550 

612 

917 

Fall 

630 

920 

633 

915 

Subgroups  within  groups  sum  of  squares,  SSsubgr. 


a d / ii 

EE  Ex 


SS 


i=i 


subgr  — 


-c 


(29) 


Sum  of  squares  of  columns,  SS/\. 

a / b n \ 2 

\kj 

ssA  = 1=1  — - - c 


a / U II 

E EE* 


bn 


(30) 


Sum  of  squares  of  rows,  SSg. 


b / a n \2 

2 (2  2 Xik< 


00  k— 1 \ i — 1 j — 1 / 

bbB  ~ bn  u 


(31) 


Interaction  sum  of  squares,  SS/\  x B 


SSaxB  — SSsut,nr  - SSa  - SS 


(32) 


Within  subgroups  sum  of  squares,  SSW1* thin* 


SSw 


— SStntal  — SS 


subgr 


(33) 
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The  anova  table  for  the  two-way  anova  with  replicate  samples  is 


illustrated  in  Table  31. 


Table  31.  The  anova  table  for  the  two-way  anova  with  replicate  samples. 


Source  of 
variation 

Degrees  of 
freedom 

SS 

MS 

Fs 

Subgroups 

ab-1 

S^subgr 

S^subgr 

ab-i 

A (columns) 

a-1 

SSA 

SSA 

a-1 

msa 

F^wi  thin 

B (rows) 

b-1 

SSB 

SSB 

msb 

Fl^wi  thi  n 

A X B (interaction) 

(a-l)(b-l) 

SSA  X B 

SSA  X B 

mSA  X B 

(a-i ) (b-1 ) 

^within 

Within  Subgroups 

ab (n-1 ) 

S^within 

S$wi thin 
ab(n-l ) 

Total 

abn-1 

SSfotal 

An  example  of  the  two-way  anova  with  replication  is  presented  in  Example 
9f . 
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EXAMPLE  9f 

Two-Way  Classification  ANOVA 


Problem: 

Suspended  solids  concentration  was  monitored  at  the  mouths  of  two 
adjacent  watersheds,  A and  B,  over  a period  of  one  year.  Watershed  A has 
been  30%  clearcut  while  Watershed  B is  fully  vegetated.  The  results  are 
tabulated  below.  Determine  if  there  is  a significant  difference  (a=  0.01) 
in  the  mean  SS  concentration  between  watersheds  (1)  on  an  annual  basis  and 
(2)  by  season  (spri ng-summer  vs.  fall-winter). 

Suspended  Solids  Concentration 


Watershed 

Season 

A 

B 

Spri ng-Summer 

60 

27 

75 

22 

83 

25 

69 

24 

58 

26 

89 

29 

Fal 1 -Wi nter 

57 

20 

45 

21 

59 

17 

61 

15 

38 

18 

40 

19 

Solution: 

1.  Establish  the  hypothesis  to  be  tested: 

a.  Stations 

Ho-  MA  — MB 
Ha:  ma  =£  mb 

b.  Seasons 


H0:  Mss  — Mfw 
Ha:  Mss  =£  Mfw 
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2.  Select  the  significance  level. 

From  the  problem  statement,  a = 0.05. 

3.  Develop  the  anova  table  and  determine  the  test  statistics,  Fs. 

The  data  were  analyzed  using  the  ANOVA  subprogram  from  SPSS  (1975). 

It  should  be  noted  that  the  ANOVA  procedure  from  SAS  could  have  been 
used  to  solve  for  the  test  statistics.  The  results  of  the  analysis 
are  presented  in  Table  32.  From  Table  32  it  can  be  seen  that  the  test 
statistic  for  the  seasons  is  equal  to  19.481,  the  test  statistic  for 
the  stations  is  137.944  and  the  test  statistic  for  the  interaction 
between  station  and  season  is  5.149. 

4.  Define  the  critical  regions,  Fc. 

Since  the  degrees  of  freedom  for  the  season,  station  and  interaction 
are  all  equal  to  1,  we  find  from  Table  A-4 

Fc  = F .01  (1,20)  = 8*10 

5.  Reject  or  accept  the  null  hypothesis. 

Since  Fs  > Fc  for  both  the  seasons  and  the  stations,  we  do  not 
accept  the  null  hypothesis  established  in  step  1 of  the  solution. 

These  results  indicate  that  the  sediment  concentration  differs 
significantly  (a  = 0.01)  between  stations  and  between  seasons.  The 
interaction  effect  between  season  and  station  is  not  significant. 
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Table  32.  Results  of  the  analysis  of  variance  for  Example  9f  using  the  ANOVA  subprogram  from  SPSS  (1975). 


11.0  Regression  and  Correlation 


11.1  Introduction 

Regression  and  correlation  are  powerful  statistical  methods  which  are 
commonly  used  in  water  quality  data  analysis.  With  regression,  we 
establish  a functional  relation  of  one  variable  upon  another.  Correlation, 
which  is  often  confused  with  regression,  is  a measurement  of  the  amount  of 
association  between  two  variables. 

This  section  begins  with  a detailed  discussion  of  simple  linear 
regression.  Initially,  the  assumptions  underlying  simple  linear  regression 
are  examined.  Next,  the  procedures  of  how  to  compute  the  (1)  regression 
line,  (2)  significance  of  the  regression  line,  (3)  correlation  coefficients 
and  (4)  confidence  limits  about  the  regression  line  and  a point  estimate 
are  clearly  outlined.  This  is  followed  by  a brief  discussion  of  covariance 
in  which  I review  how  it  can  be  used  to  test  (1)  if  the  simple  linear 
regression  lines  are  significantly  different,  and  (2)  if  their  slopes  are 
the  same. 

Next,  multiple  linear  regression  is  covered.  The  procedures  of  how  to 
compute  (1)  the  regression  line,  (2)  the  significance  of  the  regression 
line  and  (3)  the  coefficient  of  multiple  determination  are  covered. 

Finally,  a brief  discussion  of  curvilinear  regressions  is  presented. 

11.2  Simple  Linear  Regression 
11.2.1  Assumptions 

There  are  several  assumptions  underlying  the  method  of  simple  linear 
regression.  They  are  listed  below  along  with  the  methods  to  test  each. 


1.  The  relationship  of  the  two  variables,  Y to  X,  is  linear,  that  is 
Y = a +■  bX.  To  test  this  assumption,  the  first  step  is  to 
develop  a scatter  diagram.  A scatter  diagram  is  simply  a plot  of 
the  raw  data.  By  observation,  you  should  be  able  to  tell  whether 
or  not  a strong  linear  relationship  exists  between  Y and  X.  If 
the  relationship  does  not  appear  to  be  linear,  then  another  model 
should  be  considered.  However,  if  it  does  appear  to  follow  a 
linear  relation,  then  the  next  step  is  to  develop  the  regression 
line  and  test  its  significance  (these  procedures  are  outlined  in 
Sections  11.2.2  and  11.2.3,  respectively). 

2.  The  departures  (errors  or  residuals)  of  the  sample  observations 
from  the  regression  line  are  independent.  To  test  this 
assumption,  a time-sequence  plot  of  departures,  commonly  called  a 
residual  plot,  is  developed  (Figure  9).  If  the  residuals  are 
independent,  then  the  points  should  be  evenly  scattered  around 
the  zero  residual  line. 

3.  The  variance  of  the  observed  Y values  around  the  regression  line 
is  constant  over  the  range  of  X values.  This  assumption  can  be 
tested  by  observation  of  a plot  of  the  residuals  (errors)  against 
the  predicted  Y values,  commonly  denoted  Y (Figure  10).  If  the 
variance  is  homogeneous  over  the  range  of  X values,  the  points 
should  be  evenly  scattered  around  the  zero  residual  line. 

4.  The  values  of  X are  assumed  to  be  measured  without  error. 

5.  Although  normality  of  the  data  is  not  required  for  the  regression 
procedure,  the  F test  for  significance  of  the  regression  line  and 
the  t-test  for  computation  of  the  confidence  limits  about  the 
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Residual  Residual 


Time 


Time 


Figure  9.  Testing  for  independence  of  the  residuals.  Case  A 
represents  independence  while  case  B illustrates  non-independence 
of  the  residuals. 
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Figure  10.  Plots  testing  for  homogeneity  in  the  variances. 
Plot  A represents  homogeneity  while  Plot  B is  heterogeneous. 
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regression  line  do  assume  that  the  variation  of  the  observed  Y values 
about  the  regression  line  follows  a normal  distribution.  This  can  be 
tested  using  the  standard  tests  for  normality  discussed  in  Section 

5.1.3. 


11.2.2  Least  Squares  Determination  of  the  Simple  Linear  Regression  Line 
The  purpose  of  regression,  as  you  recall,  is  to  establish  a functional 
relationship  of  one  variable  with  one  or  more  other  variables.  In  simple 
linear  regression,  we  estimate  the  relationship  of  one  variable,  Y,  with 
another,  X,  by  expressing  Y in  terms  of  a linear  function  of  X.  For 
illustrative  purposes,  consider  the  data  for  two  water  quality  variables,  X 
and  Y,  which  are  paired  by  sampling  date  and  were  collected  at  the  same 
station  (Table  33).  Because  of  budgetary  cutbacks  we  can  no  longer  afford 
to  sample  both  parameters  as  frequently  as  in  the  past.  We  would  like  to 
know  if  we  can  predict  Y accurately  if  we  only  measure  X. 

It  is  obvious  from  Figure  11,  which  is  a scatter  diagram  of  the  data 
from  Table  33,  that  a strong  linear  relationship  exists  between  X and  Y.  A 
straight  line  could  be  fit  through  this  data  by  eye  to  show  the  relation- 
ship between  the  two  variables.  However,  this  method  is  not  very  accurate. 

A simple  mathematical  procedure  we  can  use  to  establish  the  regression 
line  is  the  method  of  least  squares.  As  you  recall,  the  general  equation 
of  a straight  line  is 

Y = a + bX 

where:  a = the  value  of  the  Y intercept  when  X = 0,  and 

b = a coefficient  establishing  the  slope  of  the  line. 
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Table  33.  Data  for  two  water  quality  variables,  X and  Y.a/ 


Sample  Number 

X 

1 

XY 

X2 

1 

16 

18 

288 

256 

2 

78 

68 

5304 

6084 

3 

89 

92 

8188 

7921 

4 

45 

48 

2160 

2025 

5 

63 

45 

2835 

3969 

6 

71 

65 

4615 

5041 

7 

97 

80 

7760 

9409 

8 

112 

105 

11760 

12544 

9 

34 

28 

952 

1156 

10 

120 

108 

12960 

14400 

10 

725 

657 

56822 

62805 

The  objective  of  the  least  squares  method  is  to  establish  the  values 
for  the  coefficients  "a"  and  "b,"  which  will  yield  a line  for  which  the  sum 
of  the  squared  deviations  from  the  observed  Y values  to  the  straight  line 

I2un 
100 
80 

Y 60 
40 
20 

0 i — — — - i — — r i > i 

0 20  40  60  80  100  120 

X 

Figure  11.  Scatter  diagram  of  X vs.  Y for  the  data  from  Table  33. 

b_l  Also  included  is  other  information,  XY  and  X2,  necessary  to 
determine  the  linear  regression  line. 
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is  the  least  possible.  The  method  involves  the  use  of  two  normal  equations 
(Equations  34  and  35)  which  are  solved  simultaneously.  The  procedure  for 
solving  for  the  coefficients  "a"  and  "b"  is  as  follows. 

Y = an  + bx  (34) 

XY  = a EX  + bEX2  (35) 

1.  Substitute  the  appropriate  values  from  Table  33  into  the  normal 
equations. 

657  = 10a  + 725  b (36a) 

56822  = 725a  + 62805  b (36b) 

2.  Divide  Equation  36a  by  10  and  Equation  36b  by  725  to  bring  the 


value  of  "a"  to  1.0  in  each. 

65.7  = a + 72.5  b (36c) 

78.375  = a + 86.628  b (36d) 

3.  Subtract  Equation  36d  from  Equation  36c  and  solve  for  b. 

65.7  = a + 72.5  b (36c) 

-78.375  = a + 86.628  b (36d) 

-12.675  = - 14.128  b (37) 

Therefore:  b = 0.897 


4.  Now,  substitute  the  value  of  "b"  into  either  Equation  36a  or  36b 
and  solve  for  "a". 

a = 0.656 

Y = 0.656  + 0.897  X (38) 

Using  Equation  38  we  can  now  predict  a Y value  for  any  value  of  X.  A note 
of  caution  is  in  order  here.  You  should  be  very  careful  about  extrapo- 
lating your  estimates  beyond  the  region  in  which  the  linear  relation  was 
developed. 


114 


11.2.3  Testing  the  Significance  of  the  Simple  Linear  Regression  Line 


At  this  point  in  our  regression  analysis  we  should  ask  "How  well  does 
the  regression  line  fit  the  data?"  To  answer  this  question  we  need  to 
consider  the  total  variation  in  the  Y data  about  its  mean.  By  fitting  the 
regression  line  to  the  data  we  have,  in  effect,  attempted  to  explain  part 
of  this  variation  by  the  linear  association  of  Y with  X.  The  portion  of 
the  variation  that  remains,  that  of  Y about  the  regression  line,  is  called 
the  residual  or  unexplained  variation.  When  we  test  significance  of  the 
regression  line,  we  are  seeking  to  find  if  the  portion  of  the  variation  of 
Y that  is  explained  by  the  regression  line  is  significantly  greater  than 
the  portion  of  the  variation  of  Y that  is  unexplained. 

To  test  the  significance  of  the  regression  line  we  use  anova.  The 
total  sum  of  squares  for  Y,  corrected  for  the  mean,  is  denoted  by  SS^otal 
and  estimates  the  amount  of  variation  of  individual  Y ' s about  Y.  The 
amount  of  variation  in  Y that  is  associated  with  the  regression  on  X is 
called  the  reduction  or  regression  sum  of  squares,  SSreg  (Equation  39). 


SS 


reg 


(Sxy)2 

Ex2 


(39) 


where 


Exy  = E(XY)  - i?xHEY) 


Ex2  = EX2  - — 

The  portion  of  the  total  variation  in  Y that  is  not  associated  with  the 
regression  is  called  the  residual  sum  of  squares,  SSres  (Equation  40). 

SSres  = SStotal  “ SSreg 

The  anova  table  for  the  significance  of  regression  line  test  is  given  in 
Table  34.  As  in  the  anova  testing  discussed  earlier,  we  use  the 
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Table  34.  The  anova  table  for  the  significance  of  regression  line  test. 


Source  of 
Vari ation 

Degrees  of 
Freedom 

SS 

MS 

Fs 

Regression 

1 

SSpeg 

2/  2 
s^/s 

Y 

Y YX 

Residual  (error) 

n-2 

SSres 

2 

SYX 

Total 

n-1 

SSfot 

S^ 

Y X 

residual  or  error  variation  as  the  standard  for  testing  the  variation 
explained  by  the  regression.  The  calculated  F statistic,  Fs,  is  compared 
with  the  tabular  F,  Fa(V L v )s  to  test  for  significance. 

Freese  (1962)  points  out  that  if  there  is  a significant  difference, 
this  does  not  mean  that  the  line  we  fitted  gives  the  best  possible 
description  of  the  data  nor  does  it  mean  we  have  found  the  true 
mathematical  relationship  between  the  two  variables.  All  it  allows  us  to 
do  is  state,  with  a particular  degree  of  probability  (1  -a  ),  that  the  part 
of  the  variation  in  Y that  is  explained  by  the  fitted  line  is  signiricantly 
greater  than  the  part  that  is  unexplained. 

11.2.4  Correlation 

The  term  used  to  indicate  the  degree  of  correlation  is  the  coefficient 
of  correlation,  r.  The  coefficient  of  correlation  varies  between  -1.0  and 
1.0  and  measures  the  amount  of  association  between  two  variables.  If  all 
the  observed  Y values  lie  on  the  regression  line,  then  r = +_  1.0 
depending  on  the  slope  of  the  line.  If  r = 0,  there  is  no  correlation  at 
all  between  the  two  variables.  In  other  words,  when  r = 0,  a straight  line 
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equal  to  7 or  X would  describe  the  relationship  equally  well. 

Figure  12  illustrates  several  different  examples  of  coefficients  of 
correl ation. 


Closely  associated  with  the  coefficient  of  correlation  is  the 
coefficient  of  determination,  denoted  by  r2.  It  is  a measure  of  the 


proportion  of  the  total  variation  in  Y that  is  associated  with  the 
regression  of  Y on  X.  Consequently,  a r2  = 0.64  means  that  64  percent  of 
the  variance  in  variable  Y was  associated  with  X.  The  coefficient  of 
determination  can  be  found  using  Equation  41. 


11.2.5  Setting  Confidence  Intervals  About  a Simple  Linear  Regression 
Line  and  a Point  Estimate. 

Confidence  limits  on  the  regression  line  can  be  established  by 
specifying  several  values  over  the  range  of  X and  computing  the  lower  (L) 
and  upper  (U)  limit  using  Equations  42  and  43. 


where  Y-j  is  the  predicted  value  of  Y for  a particular  X-j , v is  the 


terms  are  as  previously  defined. 

It  should  be  pointed  out  that  these  are  confidence  limits  on  the 
regression  of  Y on  X.  In  other  words,  they  indicate  the  limits  for  a band 
which  will  cover  the  true  mean  of  Y for  a given  X,  unless  the  a in-one 


r2  = SSreg 
sstot 


(41) 


(42) 


(43) 


2 

degrees  of  freedom  for  the  residual  mean  square  (syx)  and  the  other 
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Figure  12.  Several  different  examples  of  coefficients  of  correlation. 

chance  has  occurred  (Freese,  1962).  These  limits  do  not  apply  to  a single 
predicted  value  of  Y.  The  limits  which  will  cover  a single  Y are  given  by 
Equations  44  and  45. 


u=v-u,74+i+^) 


(44) 

(45) 
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11.2.6  Example 


An  example  of  how  to  apply  simple  linear  regression  analysis  is 
presented  in  Example  10. 


l 
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EXAMPLE  10 

Simple  Linear  Regression  Analysis 


Problem: 

Suspended  solids  concentrations  were  monitored  at  Stations  A and  B 
(see  illustration  below)  over  a two-year  period.  The  results  are  tabulated 
below  the  illustration.  You  are  to  (1)  develop  a scatter  diagram  using  the 
raw  data  (data  from  Watershed  B are  to  be  assigned  X values),  (2)  fit  a 
simple  linear  regression  line  to  the  data,  (3)  test  the  significance  of  the 
regression  (P  = 0.05),  (4)  determine  the  coefficient  of  determination  and 
(5)  set  the  95%  confidence  limits  about  the  predicted  values. 
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Suspended  Solids  Concentrations  (mg/1 ) 


Subwatershed  A 


Subwatershed  B 


14 

92 

54 

66 

109 

62 

28 

36 

41 

23 

5 

97 


41 

260 

230 

170 

351 

249 

61 

87 

180 

104 

20 

289 


Solution: 


1.  The  scatter  diagram  was  constructed  by  hand  and  is  illustrated  in 
Figure  14.  It  is  readily  apparent  that  a linear  regression  can  be  fit 
to  these  data. 

2.  Parts  (2)  through  (5)  of  the  Example  were  solved  using  the  GLM  program 
from  SAS  (1979).  [It  should  be  noted  that  the  REGRESSION  program  from 
SPSS  (1975)  could  also  have  been  used  to  solve  this  problem.  However, 
REGRESSION  will  not  calculate  the  confidence  limits  about  the 
predicted  values  like  GLM  does.]  The  results  of  the  GLM  program  are 
presented  in  Table  35. 


The  simple  linear  regression  line  for  this  example  problem  is 


The  output  clearly  indicates  that  the  regression  is  highly 
significant.  Fs  = 84.08  while  Fc  = F.o5(l,10)  = 4.96. 

The  coefficient  of  determination,  r1 2,  is  equal  to  0.89. 

The  confidence  limits  about  the  predicted  values  are  tabulated  at  the 
bottom  of  Table  35  and  have  been  plotted  in  Figure  13.  The  plotted 
confidence  limits  are  not  straight  lines  because  the  greater  the  distance 
from  the  mean  of  X and  Y the  wider  the  range  at  a given  probability  level. 


SSa  = 1.5844  + 0.2977  SSb 
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GENERAL.  LINEAR  -MODELS  PROCEDURE 
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Table  35.  The  GLM  output  from  SAS  (1979)  for  Example  10. 


11.2.7  Analysis  of  Covariance  to  Compare  the  Simple  Linear  Regression 
Lines  Developed  from  Several  Groups  of  Data. 

In  some  cases,  we  may  want  to  compare  two  or  more  simple  linear 
regression  lines  and  determine  if  they  differ  in  either  their  slope  or  in 
their  level.  The  statistical  method  to  use  in  this  situation  is  the 
analysis  of  covariance.  The  procedures  necessary  for  this  analysis  can 
best  be  described  using  an  illustrative  example.  Much  of  the  discussion 
that  follows  has  been  taken  directly  from  Freese  (1967). 

Consider  two  groups,  A and  B,  composed  of  X and  Y data.  This  could 
represent  a control  watershed  and  a treated  watershed  (X  and  Y, 
respecti vely)  where  data  were  collected  prior  to  and  following  treatment 
(groups  A and  B,  respectively).  Linear  regressions  of  Y on  X were  fitted 
for  each  of  the  two  groups.  The  basic  data  and  the  fitted  regressions  were 
as  follows: 

Group  A Sum  Mean 


Y 

3 

7 

9 6 8 

13 

10 

12 

14 

82 

9.111 

X 

1 

4 

7 7 2 

9 

10 

6 

12 

58 

6.444 

n = 

9, 

ZY2 

= 848,  ZXY 

= 609, 

zx2 

= 480, 

Zy2  = 100. 

8889, 

Zxy  = 80.5556,  Zx2  = 106.2222,  Y = 4.224  + 0.7584X 
Residual  SS  = 39.7980,  with  7 df. 

Group  B Sum  Mean 


Y 

4 

6 

12 

2 

8 

7 

0 

5 

9 

2 

11 

3 

10 

79 

6.077 

X 

4 

9 

14 

6 

9 

12 

2 

7 

5 

5 

11 

2 

13 

99 

7.615 

n = 13,  ZY2  = 653,  ZXY  = 753,  ZX2  = 951,  Zy2  = 172,9231, 
Zxy  = 151.3846,  Zx2  = 197.0769,  Y = 0.228  + 0.7681X 
Residual  SS  = 56.6370  with  11  df. 
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In  testing  for  common  regressions  the  procedure  is  to  test  first  for 
common  slopes.  If  the  slopes  differ  significantly,  the  regressions  are 
different  and  no  further  testing  is  needed.  If  the  slopes  are  not 
significantly  different,  the  difference  in  level  is  tested.  The  analysis 
table  is  as  follows: 

Table  36.  Analysis  of  Covariance  Table  (after  Freeze,  1967). 


Line 

Group 

df 

2i/2 

Zxy 

Zx2 

df 

Residuals 

SS 

MS 

1 

A 

8 

100.8889 

80.5556 

106.2222 

7 

39.7980 

2 

B 

12 

172.9231 

151.3846 

197.0769 

11 

56.6370 

3 

Pooled  residuals 

18 

96.4350 

5.3575 

4 

Difference  for  testing  common  slopes 

1 

0.0067 

0.0067 

5 

Common 

20 

273.8120 

231.9402 

303.2991 

19 

96.4417 

5.0759 

G 

slope 

Difference  for  testing  levels 

1 

80.1954 

80.1954 

7 

Single 

21 

322.7727 

213.0455 

310.5909 

20 

176.6371 

regression 

The  first  two  lines  in  this  table  contain  the  basic  data  for  the  two 
groups.  To  the  left  are  the  total  df  for  the  groups  (8  for  A and  12  for 
B).  In  the  center  are  the  corrected  sums  of  squares  and  products.  The 
right  side  of  the  table  gives  the  residual  sums  of  squares  and  df.  Since 
only  simple  linear  regressions  have  been  fitted,  the  residual  df  for  each 
group  is  one  less  than  the  total  df.  The  residual  sum  of  squares  is 
obtained  by  first  computing  the  reduction  sum  of  squares  (SS  due  to 
regression)  for  each  group  (Equation  46).  This  reduction  is  then 
subtracted  from  the  total  sum  of  squares  (2y2)  to  give  the  residuals. 


cq  _ (Sxy)2 


(46) 


Line  3 is  obtained  by  pooling  the  residual  df  and  residual  sums  of 
squares  for  the  groups.  Dividing  the  pooled  sum  of  squares  by  the  pooled 
df  gives  the  pooled  mean  square. 

The  left  side  and  center  of  line  5 are  obtained  by  pooling  the  total 
df  and  the  corrected  sums  of  squares  and  products  for  the  groups.  These 
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are  the  values  that  are  obtained  under  the  assumption  of  no  difference  in 
the  slopes  of  the  group  regressions.  If  the  assumption  is  wrong,  the 
residuals  about  this  common  slope  regression  will  be  considerably  larger 
than  the  mean  square  residual  about  the  separate  regressions.  The  residual 
df  and  sum  of  squares  are  obtained  by  fitting  a straight  line  to  this 
pooled  data.  The  residual  df  is  one  less  than  the  total  df.  The  residual 
sum  of  squares  is 

SSres  = 273.8120  = 123K9402|2  = 96.4417 

303.2991 


Now  the  difference  between  these  residuals  (line  4 = line  5 - line  3) 
provides  a test  of  the  hypothesis  of  common  slopes.  The  error  term  for 
this  test  is  the  pooled  mean  square  from  line  3. 

Test  of  common  slopes:  F ( x , 18 ) = 5 3 

The  difference  is  not  significant. 

If  the  slopes  differed  significantly,  the  groups  would  have  different 
regressions,  and  we  would  stop  here.  Since  the  slopes  did  not  differ,  we 
now  go  on  to  test  for  a difference  in  the  levels  of  the  regression. 

Line  7 is  what  we  would  have  if  we  ignored  the  groups  entirely,  lumped 
all  the  original  observations  together  and  fitted  a single  linear 
regression.  The  combined  data  are  as  follows: 

n = (9  + 13)  = 22  (so  the  df  for  total  = 21) 

ZY  = (82  + 79)  = 161,  ZY2  = (848  + 653)  = 1,501 
Zy2  = 1,501  - (1^)2  = 322.7727 

ZX  = (58  + 99)  = 157,  ZX2  = (480  + 951)  = 1,431 

Zx2  = 1,431  - = 310.5909 

22 

ZXY  = (609  + 753)  = 1,362,  Zxy  = 1,362  ~ = 213.0455 

22 
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From  this  we  obtain  the  residual  values  on  the  right  side  of  line  7. 

SSres  = 322.7727  - 5-^  = 176.6371 

310.5909 

If  there  is  a real  difference  among  the  levels  of  the  groups,  the 
residuals  about  this  single  regression  will  be  considerably  larger  than  the 
mean  square  residual  about  the  regression  that  assumed  the  same  slopes  but 
different  levels.  This  difference  (line  6 = line  7 - line  5)  is  tested 
against  the  residual  mean  square  from  line  5. 

Test  of  levels:  Fsn  19)  = 80,1954  = 15.80** 

5.0759 

As  the  levels  differ  significantly,  the  groups  do  not  have  the  same 
regressions . 


11.3  Multiple  Regression 
11.3.1  Assumptions 

The  assumptions  underlying  the  methods  of  fitting  a multiple 
regression  are  the  same  as  those  for  a simple  linear  regression  (Section 
11.2.1). 


11.3.2  Least  Squares  Determination  of  the  Multiple  Regression  Line 

In  some  situations,  the  dependent  variable  (Y)  is  related  to  more  than 
one  independent  variable  (Xj,  X2,  X3,  . . . , Xn).  As  we  have 
shown,  we  could  fit  a simple  linear  regression  to  the  data  using  only  one 
independent  variable.  However,  the  amount  of  variation  explained  by  a 
regression  would  probably  be  much  better  if  we  used  all  the  independent 
variables  together.  The  statistical  method  which  enables  us  to  do  this  is 
called  multiple  regression. 

As  was  the  case  with  simple  linear  regression,  the  best  regression 
line  can  be  determined  using  the  method  of  least  squares.  Again,  a series 
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of  normal  equations  are  established  and  solved  simultaneously.  For  the 
general  linear  model  with  a constant  term 

Y = a + b^X^  + b2  X 2 + • • • + b^X^  (47 ) 

it  is  very  easy  to  develop  the  normal  equations,  once  you  recognize  the 
pattern.  Each  term  in  the  first  row  contains  an  each  term  in  the 
second  row  contains  an  X2;  and  so  on  through  the  nth  row.  Each  term  in 
the  first  column  has  an  and  b\ ; each  term  in  the  second  column  has  a 
X2  and  b2;  and  so  on  through  the  nth  column.  Each  ith  row  will  contain 
a term  of  the  form  (Ex-j)b-j.  On  the  right  side  of  the  equations, 
every  equation  has  a term  of  z x-jy . For  the  general  model  presented  above 
(Equation  47)  the  normal  equations  would  be  as  follows: 

(£x?)b,  + (Ex^aJbj  + (Lx,x3)b3  + . . . + (DqxObn  = Exny 

(aqx.Jb,  + (Zxi)b2  + (Ex2x3)b3  + ...  + (Ex2xn)bn  = Ex2y 

(Ex^aib,  + (Ex2x3)b2  + (Ex§)b3  + . . . + (Ex3xn)bn  = £x3y 

(DqxOb,  +(Ex2xn)b2  + (Ex3xn)b3  + ...+  (Exn2)bn  = Exny 

Once  we  have  obtained  the  "b"  coefficients,  the  constant  term  can  be  found 
using  the  general  equation 

a = Y - b.X,  - b2X2  - b3X3  - . . bnXn.  (52) 

The  mechanics  of  the  least  squares  determination  of  the  multiple 
regression  line  are  outlined  using  an  illustrated  example.  (The  example 
presented  here  has  been  taken  directly  from  Freese,  1967.  It  is  included 
with  only  minor  modification  because  I feel  it  presents  the  concept  in  a 
very  clear  and  concise  manner.)  Consider  the  data  presented  in  Table 


(48) 

(49) 

(50) 


(51) 
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37.  With  this  data  we  would  like  to  fit  an  equation  of  the  form 


Y - a + ^1^1  + ^2^2  + ^3X3 


Table  37.  Data  for  the  example  illustrating  the  least  squares 
determination  of  the  multiple  regression  line  (after  Freese,  1967). 


Y 

Xl 

*2 

*3 

65 

41 

79 

75 

78 

90 

48 

83 

85 

53 

67 

74 

50 

42 

52 

61 

55 

57 

52 

59 

59 

32 

82 

73 

82 

71 

80 

72 

66 

60 

65 

66 

113 

98 

96 

99 

86 

80 

81 

90 

104 

101 

78 

86 

92 

100 

59 

88 

96 

84 

84 

93 

65 

72 

48 

70 

81 

55 

93 

85 

77 

77 

68 

71 

83 

98 

51 

84 

97 

95 

82 

81 

90 

90 

70 

78 

87 

93 

61 

89 

74 

45 

96 

81 

70 

50 

80 

77 

75 

60 

76 

70 

75 

68 

74 

76 

93 

75 

96 

85 

76 

8L 

58 

80 

71 

72 

58 

68 

61 

46 

69 

65 

Sums  2,206 

1,987 

2,003 

2,179 

Means  78.7857  70.9643  71.5357  77.8214 

(n  = 28) 
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According  to  the  principle  of  least  squares,  the  best  estimates  of  the 
"b"  coefficients  can  be  obtained  by  solving  the  set  of  least  squares  normal 
equations . 

b,  equation:  (Ex?)^  + (Ex1x2)b2  + (Ex1x3)b3  = Ex,y 

b2  equation:  (Ex^b,  + (Ex|)b2  + (Ex2x3)b3  = Ex2y 

b3  equation:  (Ex^b,  + (Ex2x3)b2  + (Ex§)b3  = Ex3y 

where:  Sx,x,  = SX.X,  - <EXi](£Xj 

The  corrected  sums  of  squares  and  products  are  computed  in  the 
familiar  manner: 

Ly2  = EY2  _ = (652  + . . . + 61 2)  - = 5,974.7143 

Ex?  - EX?  - = (41 2 + . . . + 462)  - = 11,436.9643 

Ex,y  = EX,Y  - = (41)(65)  + . . . + (46)(61)  - (1’98^2,2°-  = 6,428.7858 


Similarly, 


Ex,x2  = -1,171.4642 
Ex,x3  = 3,458.8215 
Ex?  = 5,998.9643 


Ex2x3  = 1,789.6786 
Ex2y  = 2,632.2143 
Ex§  = 2,606.1072 
Ex3y  = 3,327.9286 

Putting  these  values  in  the  normal  equations  gives: 


11,436.9643b!  - 1,171 .4642b2  + 3,458.8215b3  = 6,428.7858 


- 1,171.4642b!  + 5,998.9643b2  + l,789.6786b3  = 2,632.2143 


3,458.8215bi  + l,789.6786b2  + 2,606.1072b3  = 3,327.9286 


These  equations  can  be  solved  by  any  of  the  standard  procedures  for 


simultaneous  equations.  One  approach  is  as  follows: 
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1.  Divide  through  each  equation  by  the  numerical  coefficient  of 
bl- 

b],  - 0.102, 427, 897b2  + 0.302, 424, 788b3  = -0.562,105,960 

bi  - 5.120,911,334b2  - 1 ,527 ,727 ,949b3  = -2.246,943,867 

bi  + 0.517, 424, 389b2  + 0.753, 466, 809b3  = 0.962,156,792 

2.  Subtract  the  second  equation  from  the  first  and  the  third  from 

the  first  so  as  to  leave  two  equations  in  b2  and  b3. 
5.018,483,437b2  + 1. 830,152, 737b3  = 2.809,049,827 

-0.619, 852, 286b2  - 0.451 ,042, 021b3  = -0.400,050,832 

3.  Divide  through  each  equation  by  the  numerical  coefficient  of 
b2. 

b2  + 0.364, 682, 430b3  = 0.559,740,779 
b2  + 0.727, 660, 494b3  = 0.645,397,042 

4.  Subtract  the  second  of  these  equations  from  the  first,  leaving 
one  equation  in  b3. 

-0.362, 978, 064b3  = -0.085,656,263 

5.  Solve  for  b3. 

b3  = -0.085,_656,26_3  = q. 235, 981 ,927 
-0.362,978,064 

6.  Substitute  this  value  of  b3  in  one  of  the  equations  (say  the 
first)  of  step  3 and  solve  for  b2. 

b2  + (0.364,682 ,430) (0.235,981 ,927 ) = 0.559,740,779 
b = 0.473,682,316 

7.  Substitute  the  solutions  fnr  b2  and  b3  in  one  of  the 
equations  (say  the  first)  of  step  1,  and  solve  for  bi. 

b!  - (0.102, 427, 897 )(0. 473, 682, 316)  + (0.302,424,788) (0.235,981 ,927 ) 

= 0.562,105,960 
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bi  = 0.539,257,459 

8.  As  a check,  add  up  the  original  normal  equations  and  substitute 
the  solutions  for  b\,  b2  and  b3. 

13,724.3216b!  = 6,617 ,1787b2  + 7,854.6073b3  = 12,388.9287 
12,388.92869  = 12,388.9287,  check. 

Given  the  values  of  bx , b2,  and  b3  we  can  now  compute 
a = Y - b,X,  - b2X2  - b3X3  = -1 1 .7320 

Thus,  after  rounding  of  the  coefficients,  the  regression  equation  is 
Y = -11.732  + 0.539  Xi  + 0.474  X2  + 0.236  X3 

It  should  be  noted  that  in  solving  the  normal  equations  more  digits  have 
been  carried  than  would  be  justified  by  the  rules  for  number  of  significant 
digits.  Unless  this  is  done,  the  rounding  errors  may  make  it  difficult  to 
check  the  computations. 

11.3.3  Testing  the  Significance  of  the  Multiple  Regression  Line 

At  this  point  in  our  regression  analysis  we  should  ask,  "How  well  does 
the  regression  line  fit  the  data?"  The  analysis  of  variance  procedure  to 
be  used  here  is  similar  to  that  outlined  for  the  significance  test  of  the 
simple  linear  regression  line.  However,  in  this  case  the  degrees  of 
freedom  for  the  reduction  are  equal  to  the  number  of  independent  variables 
fitted.  The  reduction  sum  of  squares  for  any  least  squares  regression  can 
be  found  using  the  general  equation: 

SSreg  = (est.  coefficients) (right  side  of  their  respective 

normal  equations)  (53) 

Therefore,  for  Freese's  example 

SSreg  = b^Extf)  + b2(Ex2y)  + b3(Ex2y) 

The  anova  table  for  the  test  of  significance  is  as  follows  in  Table  38: 
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Table  38.  ANOVA  results  for  the  test  of  significance  of  the  multiple 
regression  developed  from  the  data  given  in  Table  36  (after  Freese,  1967). 


Source 

df 

SS 

MS 

Reduction  due 
Residuals  . . 

to  Xi , X2,  and  X3  . 

. . 3 

. . 24 

5,498.9389 

475.7754 

1,832.9796 

19.8240 

Total  . . . . 

. . 27 

5,974.7143 

To  test  the  significance  of  the  regression  we  compute  Fs  where 

r _ Reduction  MS 
•"s 

Residual  MS 

For  the  case  at  hand  Fs  = 92.46,  which  is  significant  at  the  0.01  level. 

In  some  instances  we  would  like  to  know  the  contribution  each 
independent  variable  makes  in  the  prediction  of  the  dependent  variable.  In 
other  words,  what  portion  of  the  total  SS  can  be  attributed  to  each 
individual  independent  variable.  The  statistical  method  to  use  for  this 
type  of  analysis  is  stepwise  regression.  The  procedures  for  stepwise 
regression  are  not  outlined  here,  but  can  be  found  in  any  good  statistics 
text. 


11.3.4  Coefficient  of  Multiple  Determination 

The  coefficient  of  multiple  determination  is  calculated  in  the  same 
manner  as  that  for  the  simple  linear  regression. 


r2 


SSreg 

SStot 


(54) 


For  our  illustrative  example,  r^  = 0.92,  which  means  92  percent  of 
the  variation  in  Y is  associated  with  the  regression. 


133 


11.4  Curvilinear  Regression 

Only  a very  brief  discussion  of  curvilinear  regression  is  presented 
here.  In  general,  many  of  the  curvilinear  relationships  can  be  handled 
using  the  regression  methods  already  presented.  Consider  the  simple  power 
function 

Y = aXb 

(55) 

Water  quality  data  collected  from  streams  will  very  often  follow  this 
relationship.  This  function  can  be  linearized  using  a simple  log 
transformation  (Equation  56).  In  addition,  many  other  curvilinear 
relationships  can  be  linearized.  Chow  (1964)  presents  a very  extensive 
tabulation  of  transformations  for  1 i nearization  of  different  equations 
(Table  39). 

log  Y = log  a + b log  X (56) 

However,  some  curvilinear  functions,  such  as 

Y = a + bx 
or 

Y = a(X-b)2 

cannot  be  fitted  by  the  methods  already  described.  To  regress  these 
functions  requires  procedures  beyond  the  scope  of  this  paper. 
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Table  39.  Transformations  of  linearization  of  different  functions  (after 
Chow,  1967). 


1 „ 

2 V 

3 V 

* V 

s u 

« V 

7 V 

8 1/ 

9 y 

10  y 

is  v 

12  y 

13  y 
U y 

16  y 


Type  of  function 


Straight-line  coordinate 


Abscissa 


Ordinate 


Equation  in  linear  form 


“0  + 6* 

- be** 

- a** 

- a.  + aix  + ojx’ 

- a + b/x 

» x/(a  4-  bxi 
~ a/(b  + rxl 


log  x 
X — xo 
l/x 


V 

log  V 
log  y 
v - v» 

* — X. 

V 

z/V 

l/sr 


lt/|  - a + b|x| 

|log  y|  - log  6 4*  (a  log  e)|x| 
(log  vl  - log  a + 6(log  x| 


[Hi]  ‘ 


ai  4-  2a  ix.  4*  aif(x  — x.)| 


ltd  - a + b(l/x| 

|x/p|  - a -I-  b(x| 

\Uy\  - (b/a)  4-  (c/a)|x| 


r 4-  be“* 
c 4-  ax* 
c+-^ 


c 4- 


a + bx 


log  x 


'°*Tx 

i *v 

'n*Tz 


u - V 

X — T9 


[logg]  - log  (aM  4-  (a  log  e)(x| 

[log  Jx]  “ ,0*  + ^ “ l)|logx| 

r^=-2]  ~ 4 1 — |x  - x.| 

f=^=1  - (a  + bxo)  4-  — ^ lx| 

L i/  - vj  « 


d 4-  cx  + be*' 
(lc*bm,  wh«*re  i 


log  — 
Ax* 


A*(log  y) 

lo«- cr- 


[,ogS]  - 

ly  — 6c**|  » ^ 
j~log  d’Qog  V)  j 


. log  (a’b)  + (a  log  c)(x| 
ax’ J 

lv  - be-*  | ■»  d + c|x| 
log  <f’(log  y)  1 f (log  b)  (log 


, f (log  b) (log  a) ’"J  , „ ,,  , 

IOgL  doge)’  "J  + 00ga)W 


‘U"  4-  he* 


ems{H  con  tx  4 r sin  6x) 


y»+  i 
y* 


y*+i 

Vk 


Vk+t 

Uk 


Vk+t 

Uk 


|log  y - a*  log  6|  - log  d 4-  (log  c)|x| 

t^]  “ + [^7] 

|j/e-'*|  - i + b(e<*-'''| 

pw]  - - (2c"-1*  cos  b Ax)  [^] 

r ■ 1 ^ d 4-  c(tan  6x| 

[ cow  bx  | 


lu  types  14  and  15.  y»,  y*+t.  and  Vk+i  are  consecutive  values  lor  an  increment  Ax. 
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APPENDIX  A 
STATISTICAL  TABLES 


Table  A-l. 
the  values 


Probability  of  a random 
tabulated  in  the  margins 


value  of  z - (X- y)/ <7  being  greater  than 
(Steel  and  Torrie,  1960). 


z 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

.0 

.5000 

.4960 

.4920 

.4880 

.4840 

.4801 

.4761 

.4721 

.4681 

.4641 

.1 

.4602 

.4562 

.4522 

.4483 

.4443 

.4404 

.4364 

.4325 

.4286 

.4247 

2 

.4207 

.4168 

.4129 

.4090 

.4052 

.4013 

.3974 

.3936 

.3897 

.3859 

.3 

.3821 

.3783 

.3745 

.3707 

.3669 

.3632 

.3594 

.3557 

.3520 

.3483 

.4 

.3446 

.3409 

.3372 

.3336 

.3300 

.3264 

.3228 

.3192 

.3156 

.3121 

.3 

.3085 

.3050 

.3015 

.2981 

.2946 

.2912 

.2877 

.2843 

.2810 

.2776 

.6 

.2743 

.2709 

.2676 

.2643 

.2611 

.2578 

.2546 

.2514 

.2483 

.2451 

.7 

.2420 

.2389 

.2358 

.2327 

.2296 

.2266 

.2236 

.2206 

.2177 

.2148 

.8 

.2119 

.2090 

.2061 

.2033 

.2005 

.1977 

.1949 

.1922 

.1894 

.1867 

.9 

.1841 

.1814 

.1788 

.1762 

.1736 

.1711 

.1685 

.1660 

.1635 

.1611 

1.0 

.1587 

.1562 

.1539 

.1515 

.1492 

.1469 

.1446 

.1423 

.1401 

.1379 

1.1 

.1357 

.1335 

.1314 

.1292 

.1271 

.1251 

.1230 

.1210 

.1190 

.1170 

1.2 

.1151 

.1131 

.1112 

.1093 

.1075 

.1056 

.1038 

.1020 

.1003 

.0985 

1.3 

.0968 

.0951 

.0934 

.0918 

.0901 

.0885 

.0869 

.0853 

.0838 

.0823 

1.4 

.0808 

.0793 

.0778 

.0764 

.0749 

.0735 

.0721 

.0708 

.0694 

.0681 

1.3 

.0668 

.0655 

.0643 

.0630 

.0618 

.0606 

.0594 

.0582 

.0571 

.0559 

1.6 

.0548 

.0537 

.0526 

.0516 

.0505 

.0495 

.0485 

.0475 

.0465 

.0455 

1.7 

.0446 

.0436 

.0427 

.0418 

.0409 

.0401 

.0392 

.0384 

.0375 

.0367 

1.8 

.0359 

.0351 

.0344 

.0336 

.0329 

.0322 

.0314 

.0307 

.0301 

.0294 

1.9 

.0287 

.0281 

.0274 

.0268 

.0262 

.0256 

.0250 

.0244 

.0239 

.0233 

2.0 

.0228 

.0222 

.0217 

.0212 

.0207 

.0202 

.0197 

.0192 

.0168 

.0183 

2.1 

.0179 

.0174 

.0170 

.0166 

.0162 

.0158 

.0154 

.0150 

.0146 

0143 

22 

.0139 

.0136 

.0132 

.0129 

.0125 

.0122 

.0119 

.0116 

.0113 

.0110 

2.3 

.0107 

.0104 

.0102 

.0099 

.0096 

.0094 

.0091 

.0089 

.0087 

.0084 

2.4 

.0082 

.0080 

.0078 

.0075 

.0073 

.0071 

.0069 

.0068 

.0066 

.0064 

2.5 

.0062 

.0060 

.0059 

.0057 

.0055 

.0054 

.0052 

.0051 

.0049 

.0048 

2.6 

.0047 

.0045 

.0044 

.0043 

.0041 

.0040 

.0039 

.0038 

.0037 

.0036 

2.7 

.0035 

.0034 

.0033 

.0032 

.0031 

.0030 

.0029 

.0028 

.0027 

.0026 

2.8 

.0026 

.0025 

.0024 

.0023 

.0023 

.0022 

.0021 

.0021 

.0020 

.0019 

2.9 

.0019 

.0018 

.0018 

.0017 

.0016 

.0016 

.0015 

.0015 

.0014 

.0014 

3.0 

.0013 

.0013 

.0013 

.0012 

.0012 

.0011 

.0011 

.0011 

.0010 

.0010 

3.1 

.0010 

.0009 

.0009 

.0009 

.0008 

.0008 

.0008 

.0008 

.0007 

.0007 

3.2 

.0007 

.0007 

.0006 

.0006 

.0006 

.0006 

.0006 

.0005 

.0005 

.0005 

3.3 

.0005 

.0005 

.0005 

.0004 

.0004 

.0004 

.0004 

.0004 

.0004 

.0003 

3.4 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0002 

3.6 

.0002 

.0002 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

.0001 

3.9 

.0000 

Table  A-2.  Values  of  t (Steel  and  Torrie,  1960) 


Probability  of  a larger  value  of  t,  sign  ignored 


df 


0.5 

0.4 

0 

3 

0 

2 

0 

1 

0. 

05 

0. 

02 

0. 

01 

0.001 

1 

1 .000 

1.376 

1 

963 

3 

078 

6 

314 

12 

706 

31 

821 

63 

657 

636 

319 

2 

.816 

1.061 

1 

386 

1 

886 

2 

920 

4 

303 

6 

965 

9 

925 

31 

598 

3 

.765 

.978 

1 

250 

1 

638 

2 

353 

3 

182 

4 

541 

5 

841 

12 

941 

4 

.741 

.941 

1 

190 

1 

533 

2 

132 

2 

776 

3 

747 

4 

604 

8 

610 

5 

.727 

.920 

1 

156 

1 

476 

2 

015 

2 

571 

3 

365 

4 

032 

6 

859 

6 

.718 

.906 

1 

134 

1 

440 

1 

943 

2 

447 

3 

143 

3 

707 

5 

959 

7 

.711 

.896 

1 

119 

1 

415 

1 

895 

2 

365 

2 

998 

3 

499 

5 

405 

3 

.706 

.S89 

1 

108 

1 

397 

1 

860 

2 

306 

2 

896 

3 

355 

5 

041 

9 

.703 

S83 

1 

100 

1 

383 

1 

833 

2 

262 

2 

821 

3 

250 

4 

781 

10 

.700 

.879 

1 

093 

1 

372 

1 

812 

2 

228 

2 

764 

3 

169 

4 

587 

11 

.697 

S76 

1 

088 

1 

363 

1 

796 

2 

201 

2 

718 

3 

106 

4 

437 

12 

.695 

.S73 

1 

083 

1 

356 

1 

782 

2 

179 

2 

681 

3 

055 

4 

318 

13 

.694 

. S70 

1 

079 

1 

350 

1 

771 

2 

160 

2 

650 

3 

012 

4 

221 

14 

.692 

.868 

1 

076 

1 

345 

1 

761 

2 

145 

2 

624 

2 

977 

4 

140 

15 

.691 

.866 

1 

074 

1 

341 

1 

753 

2 

131 

2 

602 

2 

947 

4 

073 

16 

.690 

.865 

1 

071 

1 

337 

1 

746 

2 

120 

2 

583" 

2 

921 

4 

015 

17 

.689 

. Soo 

1 

069 

1 

333 

1 

740 

2 

110 

2 

567 

2 

898 

3 

965 

18 

.688 

S62 

1 

067 

1 

330 

1 

734 

2 

101 

2 

552 

2 

878 

3 

922 

19 

'.688 

.861 

1 

066 

1 

328 

1 

729 

2 

093 

2 

539 

2 

861 

3 

883 

20 

.687 

$60 

1 

064 

1 

325 

1 

725 

2 

086 

2 

528 

.2 

845 

3 

850 

21 

.686 

.859 

1 

063 

1 

323 

1 

.721 

2 

080 

2 

518 

2 

831 

3 

819 

22 

.686 

.S58 

1 

061 

1 

.321 

1 

717 

2 

074 

2 

508 

819 

3 

792 

23 

.685 

.858 

1 

060 

1 

319 

1 

714 

2 

069 

2 

500 

2 

807 

3 

767 

24 

.685 

.857 

1 

059 

1 

318 

1 

711 

2 

064 

2 

492 

2 

797 

3 

745 

25 

.684 

.Soo 

1 

058 

1 

.316 

1 

708 

2 

060 

2 

485 

2 

787 

3 

725 

26 

.684 

.856 

1 

058 

1 

.315 

1 

.706 

2 

056 

2 

479 

2 

779 

3 

707 

27 

.684 

. Sco 

1 

057 

1 

.314 

1 

.703 

2 

052 

2 

473 

2 

771 

3 

690 

28 

.683 

.855 

1 

056 

1 

.313 

l 

.701 

2 

048 

2 

467 

2 

763 

3 

674 

29 

.683 

.854 

1 

055 

l 

.311 

1 

.699 

2 

045 

2 

462 

2 

756 

3 

659 

30 

.683 

.854 

1 

055 

1 

.310 

1 

.697 

2 

042 

2 

457 

2 

750 

3 

646 

40 

.681 

,S5l 

1 

.050 

1 

.303 

1 

.684 

2 

021 

2 

423 

2 

704 

3 

551 

60 

.679 

.848 

1 

.046 

1 

.296 

l 

.671 

2 

000 

2 

390 

2 

660 

3 

460 

120 

.677 

$4o 

1 

.041 

1 

.289 

1 

.658 

1 

980 

2 

358 

2 

617 

3 

373 

0© 

.674 

.842 

1 

.036 

1 

.282 

1 

.645 

l 

960 

2 

326 

2 

576 

3 

291 

0.25 

0.2 

0 

.15 

0 

1 

0 

.05 

0 

.025 

0 

.01 

0 

.005 

0 

0005 

Probability  of  a larger  value  of  t,  sign  considered 


Source:  This  table  is  abridged  from  Table  III  of  Fisher  and  Yates,  Statistical  Tables  for 
Biological,  Agricultural,  end  Medical  Research , published  by  Oliver  and  Boyd  Ltd.,  Edinburgh, 
1949,  by  permission  o:  die  authors  and  publishers. 


Table  A-3.  Values  of  x (Steel  and  Torrie,  1960). 


I Probability  of  a larger  value  of  %2 


995 

r 

990 

973 

950 

.900 

.750 

500 

250 

100 

. 

050 

.025  | 

. 

OIO 

.005 

1 

0*393 

05157 

04982 

0*393 

0158 

102 

455 

1 

32 

2 

71 

3. .84 

5 

02 

6 

63 

7.88 

2 

0100 

0201 

0506 

103  • 

211 

575 

1 

39 

2 

77 

4 

61 

5 

99 

7 

38 

9 

21 

10  6 

3 

0717 

115 

216 

352 

584 

1 

21 

2 

37 

4 

11 

6 

25 

7 

81 

9 

35 

11 

3 

12  8 

4 

207 

297 

484 

711 

1 

06 

1 

92 

3 

36 

5 

39 

.7 

78 

9 

49 

11 

l 

13 

3 

14.9 

5 

412 

554 

831 

1 

15 

1 

61 

2 

67 

4 

35 

6 

63 

9 

24 

11 

1 

12 

8 

15 

1 

16.7 

6 

670 

872 

l 

24 

1 

64 

2 

20 

3 

45 

5 

35 

7 

84 

10 

6 

12 

6 

14 

4 

16 

8 

18  5 

7 

989 

i 

24 

1 

69 

2 

17 

2 

83 

4 

25 

6 

35 

9 

04 

12 

0 

14 

1 

16 

0 

18 

5 

20.3 

3 

1 

34 

i 

65 

2 

18 

2 

73 

3 

49 

5 

07 

7 

34 

10 

2 

13 

4 

15 

5 

17 

5 

20 

1 

22.0 

9 

1 

73 

2 

09 

2 

70 

3 

33 

4 

17 

5 

90 

8 

34 

11 

4 

14 

7 

16 

9 

19 

0 

21 

7 

23  6 

10 

o 

16 

2 

56 

3 

25 

3 

94 

4 

87 

6 

74 

9 

34 

12 

5 

16 

0 

18 

3 

20 

5 

23 

2 

25.2 

1! 

2 

60 

3 

05 

3 

82 

4 

57 

5 

58 

7 

58 

10 

3 

13 

7 

17 

3 

19 

7 

21 

9 

24 

7 

26.8 

12 

3 

07 

3 

57 

4 

40 

5 

23 

6 

30 

8 

44 

11 

3 

14 

8 

18 

5 

21 

0 

23 

3 

26 

2 

28.3 

13 

3 

57 

4 

11 

5 

01 

5 

89 

7 

04 

9 

30 

12 

3 

16 

0 

19 

8 

22 

4 

24 

7 

27 

7 

29.8 

14 

4 

07 

4 

6G 

5 

63 

6 

57 

7 

79 

10 

2 

13 

3 

17 

1 

21 

1 

23 

7 

26 

1 

29 

1 

31.3 

13 

4 

60 

5 

23 

6 

26 

7 

26 

8 

55 

11 

0 

14 

3 

18 

2 

22 

3 

25 

0 

27 

5 

30 

6 

32.8 

1# 

5 

14 

5 

81 

6 

91 

7 

98 

9 

31 

11 

9 

15 

3 

19 

4 

23 

5 

26 

3 

28 

8 

32 

0 

34  3 

17 

5 

70 

6 

41 

7 

56 

8 

67 

10 

1 

12 

8 

16 

3 

20 

5 

24 

8 

27 

a 

30 

2 

33 

4 

35.7 

13 

6 

26 

7 

01 

8 

23 

9 

39 

10 

9 

13 

7 

17 

3 

21 

6 

26 

0 

28 

9 

31 

5 

34 

8 

37  2 

19 

6 

84 

7 

63 

8 

91 

10 

1 

11 

7 

14 

6 

18 

3 

22 

7 

27 

2 

30 

1 

32 

9 

36 

2 

38  6 

20 

7 

43 

8 

26 

9 

59 

10 

9 

12 

4 

15 

5 

19 

3 

23 

8 

28 

4 

31 

4 

34 

2 

37 

6 

40  0 

21 

8 

03 

3 

90 

10 

3 

11 

6 

13 

2 

16 

3 

20 

3 

24 

9 

29 

6 

32 

7 

35 

5 

38 

9 

41.4 

22 

8 

64 

9 

54 

11 

0 

12 

3 

14 

0 

17 

2 

21 

3 

26 

0 

30 

3 

33 

9 

36 

3 

40 

3 

42.8 

23 

9 

26 

10 

O 

11 

7 

13 

1 

14 

8 

18 

1 

22 

3 

27 

>1 

32 

0 

35 

2 

38 

1 

41 

6 

44.2 

24 

9 

89 

10 

9 

12 

4 

13 

8 

15 

7 

19 

0 

23 

3 

28 

2 

33 

2 

36 

4 

39 

4 

43 

0 

45  6 

25 

10 

5 

11 

5 

13 

1 

14 

6 

16 

5 

19 

9 

24 

3 

29 

3 

34 

4 

37 

7 

40 

6 

44 

3 

46  9 

26 

11 

2 

12 

2 

13 

8 

15 

4 

17 

3 

20 

8 

25 

3 

30 

4 

35 

6 

38 

9 

41 

9 

45 

6 

48.3 

27 

11 

8 

12 

9 

14 

6 

16 

2 

18 

1 

21 

7 

26 

3 

31 

5 

38 

7 

40 

1 

43 

2 

47 

0 

49  6 

28 

12 

5 

13 

6 

15 

3 

16 

9 

18 

9 

22 

7 

27 

3 

32 

6 

37 

9 

41 

3 

44 

5 

48 

3 

51.0 

29 

13 

1 

14 

3 

16 

0 

17 

7 

19 

3 

23 

6 

28 

3 

33 

7 

39 

1 

42 

6 

45 

7 

49 

6 

52.3 

30 

13 

8 

IS 

0 

16 

8 

18 

5 

20 

6 

24 

5 

29 

3 

34 

3 

40 

3 

43 

8 

47 

0 

50 

9 

53.7 

40 

20 

7 

22 

2 

24 

4 

26 

5 

29 

1 

33 

7 

39 

3 

45 

6 

51 

8 

55 

8 

59 

3 

63 

7 

66.3 

so 

28 

0 

29 

7 

32 

4 

34 

8 

37 

7 

42 

9 

49 

3 

56 

3 

63 

2 

67 

5 

71 

4 

76 

2 

79  5 

60 

35 

5- 

37 

5 

40 

5 

43 

2 

46 

5 

52 

3 

59 

3 

67 

0 

74 

4 

79 

1 

83 

3 

88 

4 

92.0 

Source:  This  table  is  abridged  from  “Table  of  percentage  points  of  the  %3  distribution,**  Biomttrik*,  32:  188-189  (1941),  by  Catherine  M.  Thompson,  It  is  published  here 

with  kind  permission  of  the  author  and  the  editor  of  Bwmtlrik*, 


Table  A-4 


Values  of  F (Steel  and  Torrie,  1960) 


OesonriD" 

Probability 
of  a larger 
F 

Numerator  df 

if 

1 

2 

3 

4 

3 

S 

7 

8 

9 

l 

.100 

39.86 

49.50 

53.59 

53.83 

57.24 

58.20 

58.91 

59.44 

59.86 

.050 

161.4 

199.5 

215.7 

224.6 

230.2 

234.0 

236.8 

238.9 

2403 

.025 

647.8 

799.5 

864.2 

899.6 

921.8 

937.1 

948.2 

956.7 

9833 

.010 

4052 

4999.5 

5403 

5625 

5764 

5859 

5928 

5982 

6022 

.005 

16211 

20000 

21615 

22500 

23056 

23437 

23715 

23925 

24091 

2 

.100 

8.53 

9.00 

9.16 

9.24 

9.29 

9.33 

9.35 

9.37 

9.38 

.050 

18.51 

19.00 

19.16 

19.25 

19.30 

19.33 

19.35 

1937 

1938 

.025 

38.51 

39.00 

39.17 

39.25 

39.30 

39.33 

39.36 

3937 

3939 

.0J0 

98.50 

99.00 

99.17 

99.25 

99.30 

99.33 

99.36 

9937 

9939 

.005 

198.5 

199.0 

199.2 

199.2 

199.3 

199.3 

199.4 

199.4 

139. 4 

3 

.100 

5.54 

5.46 

5.39 

5.34 

5.31 

5.28 

5.27 

5.25 

5.24 

.050 

10.13 

9.55 

9.28 

9.12 

9.01 

8.94 

8.89 

0.85 

831 

.025 

17.44 

16.04 

15.44 

15.10 

14.88 

14.73 

14.62 

1434 

14.47 

.010 

34.12 

30.82 

29.46 

28.71 

28.24 

27.91 

27.67 

27.49 

27.35 

.005 

55.55 

49.80 

47.47 

46.19 

45.39 

44.84 

44.43 

44.13 

4338 

4 

.100 

4.54 

4.32 

4.19 

4.11 

4.03 

4.01 

3.98 

335 

334 

.050 

7.71 

6.94 

6.59 

6.39 

6.26 

6.16 

6.09 

6.04 

6.00 

.025 

12.22 

10.65 

9.98 

9.60 

9.36 

9.20 

9.07 

838 

®38 

.010 

21.20 

18.00 

16.69 

15.98 

15.52 

15.21 

14.98 

14.80 

14.66 

.005 

31.33 

26.28 

24.26 

23.15 

22.46 

21.97 

21.62 

2135 

21.14 

3 

.100 

4.06 

3.78 

3.62 

3.52 

3.45 

3.40 

3.37 

334 

332 

.050 

6.61 

5.79 

5.41 

3.19 

5.05 

4.95 

4.88 

4.82 

4.77 

.025 

10.01 

8.43 

7.76 

7.39 

7.15 

6.98 

6.85 

6.76 

638 

.010 

.005 

16.26 

22.78 

13.27 

18.31 

12.06 

16.53 

11.39 

15-56 

10.97 

14.94 

10.67 

14.51 

10.46 

14.20 

1039 

f336 

10.16 

13.77 

6 

.100 

3.78 

.3.46 

3.29 

3.18 

3.11 

3.05 

3.01 

2.98 

236 

.050 

5.99 

5.14 

4.76 

4.53 

4.39 

4.28 

4.21 

4.15 

4.10 

.025 

8.81 

7.26 

6.60 

6.23 

3.99 

5.82 

5.70 

5.60 

532 

.010 

13.75 

10.92 

9.78 

9.15 

8.75 

8.47 

8.26 

8.10 

7.98 

.005 

18.63 

14.54 

12.92 

12.03 

11.46 

11.07 

10.79 

1057 

1039 

7 

.100 

3.59 

3.26 

3.07 

2.96 

2.88 

2.83 

2.78 

2.75 

2.72 

.050 

5.59 

4.74 

4.35 

4.12 

3.97 

3.87 

3.79 

3.73 

3.66 

.025 

8.07 

6.54 

5.89 

5.52 

5.29 

5.12 

4.99 

4.90 

432 

.010 

12.25 

9.55 

8.45 

7.85 

7.46 

7.19 

6.99 

6.84 

6.72 

.005 

16.24 

12.40 

10.88 

10.05 

9.52 

9.16 

8.89 

8.68 

831 

8 

.100 

3.46 

3.11 

2.92 

2.81 

2.73 

2.67 

2.62 

239 

236 

.050 

5.32 

4.46 

4.07 

3.84 

3.69 

3.58 

3.50 

3.44 

339 

.025 

7.57 

6.06 

5.42 

5.05 

4.82 

4.65 

4.53 

4.43 

4.36 

.010 

11.26 

8.65 

7.59 

7.01 

6.63 

6.37 

6.18 

6.03 

5.91 

.005 

14.69 

11.04 

9.60 

8.81 

8.30 

7.95 

7.69 

730 

7.34 

9 

.100 

3.36 

3.01 

2.81 

2.69 

2.61 

2.55 

2.51 

2.47 

2.44 

.050 

5.12 

4.26 

3.86 

3.63 

3.48 

3.37 

3.29 

3.23 

3.18 

.025 

7.21 

5.7! 

5.08 

4.72 

4.48 

4.32 

4.201 

4.10 

4.03 

.010 

10.56 

8.02 

6.99 

6.42 

6.06 

5.80 

5.61 

5.47 

535 

.005 

13.61 

10.11 

8.72 

7.99 

7.47 

7.13 

6.88 

6.69 

6.54 

10 

.100 

3.29 

2.92 

2.73 

2.61 

2.52 

2.46 

2.41 

2.38 

2.35 

.050 

4.96 

4.10 

3.71 

3.40 

3.33 

3.22 

3.14 

3.07 

3.02 

.025 

6.94 

5.46 

4.83 

4.47 

4.24 

4.07 

3.95 

3.85 

3.78 

.010 

10.04 

7.56 

6.55 

5.99 

5.64 

5.39 

5.20 

5.06 

4.34 

.005 

12.83 

9.43 

8.08 

7.34 

6.87 

6.54 

6.30 

6.12 

537 

11 

.100 

3.23 

2.86 

2.66 

2.54 

2.45 

2.39 

2.34 

230 

2.27 

.050 

4.84 

3.98 

3.59 

3.36 

3.20 

3.09 

3.01 

235 

2.90 

.025 

6.72 

S.26 

4.63 

4.28 

4.04 

3.88 

3.76 

3.66 

339 

.010 

9.65 

7.21 

6.22 

5.67 

5.32 

5.07 

4.89 

4.74 

4.63 

.005 

12.23 

8.91 

7.60 

6.88 

6.42 

6.10 

5.86 

5.68 

53* 

12 

.100 

3.18 

2.81 

2.61 

2.48 

2.39 

2.33 

2.28 

2.24 

2.21 

.050 

4.75 

3.89 

3.49 

3.26 

3.11 

3.00 

2.91 

235 

2.80 

.025 

6.55 

5.10 

4.47 

4.12 

3.89 

3.73 

3.61 

331 

3.4* 

.010 

9.33 

6.93 

5.95 

5.41 

5.06 

4.82 

4.64 

4.50 

4.39 

.005 

11.75 

8.51 

7.23 

6.52 

6.07 

5.76 

5.52 

535 

530 

13 

.100 

3.14 

2.76 

2.56 

2.43 

2.35 

2.28 

2.23 

2.20 

2.16 

.050 

4.67 

3.8! 

3.41 

3.18 

3.03 

2.92 

2.83 

2.77 

2.71 

.025 

6.41 

4.97 

4.35 

4.00 

3.77 

3.60 

3.48 

339 

331 

.010 

9.07 

6.70 

5.74 

5.21 

4.86 

4.62 

4.44 

4.30 

4.19 

.005 

11.37 

8.19 

6.93 

6.23 

5.79 

5.48 

5.25 

5.08 

434 

14 

.100 

3.10 

2.73 

2.52 

2.39 

2.31 

2.24 

2.19 

ZI5 

2.12 

.050 

4.60 

3.74 

3.34 

3.11 

2.96 

2.85 

2.76 

2.70 

2.65 

.025 

6.30 

4.86 

4.24 

3.89 

3.66 

3.50 

3.38 

339 

331 

.010 

8.86 

6.51 

5.56 

5.04 

4.69 

4.46 

4.28 

4.14 

4.03 

.005 

11.06 

7.92 

6.68 

6.00 

5.56 

5.26 

5.03 

4.86 

4.72 

Numerator  df 


10 

12 

15 

20 

24 

30 

40 

60 

120 

ao 

P 

df 

60.19 

60.71 

61.22 

61.74 

62.00 

62.26 

62.53 

62.79 

63.06 

63.33 

.100 

1 

241.9 

243.9 

245.9 

248.0 

249.1 

250.1 

251.1 

252.2 

253.3 

254.3 

.050 

968.6 

976.7 

984.9 

993.1 

997.2 

1001 

1006 

1010 

1014 

1018 

.025 

6056 

6106 

6157 

6209 

6235 

6261 

6287 

6313 

6339 

6366 

.010 

24224 

24426 

24630 

24836 

24940 

25044 

25148 

25253 

25359 

25465 

.005 

9.39 

9.41 

9.42 

9.44 

9.45 

9.46 

9.47 

9.47 

9.48 

<9.49 

.100 

2 

19.40 

19.41 

19.43 

19.45 

19.45 

19.46 

19.47 

19.48 

19.49 

19.50 

.050 

39.40 

39.41 

39.43 

39.45 

39.46 

39.46 

39.47 

39.48 

39.49 

39.50 

.025 

99.40 

99.42 

99.43 

99.45 

99.46 

99.47 

99.47 

99.48 

99.49 

99.50 

.010 

199.4 

199.4 

199.4 

199.4 

199.5 

199.5 

199.5 

199.5 

199.5 

199.5 

.005 

5.23 

5.22 

5.20 

5.18 

5.18 

5.17 

5.16 

5.15 

5.14 

5.13 

.100 

3 

8.79 

8.74 

8.70 

8.6S 

8.64 

8.62 

8.39 

8.57 

8.55 

8.53 

.050 

14.42 

14.34 

14.25 

14.17 

14.12 

14.08 

14.04 

13.99 

13.95 

13.90 

.025 

27.23 

27.05 

26.87 

26.69 

26.60 

26.50 

26.41 

26.32 

26.22 

26.13 

.010 

43.69 

43.39 

43.08 

42.78 

42.62 

42.47 

42.31 

42.15 

41.99 

41.83 

.005 

3.92 

3.90 

3.87 

3.84 

3.83 

3.82 

3.80 

3.79 

3.78 

3.76 

.100 

4 

5.96 

5.91 

5.86 

5.80 

5.77 

5.75 

5.72 

5.69 

5.66 

5.63 

.050 

8.84 

’ 8.75 

8.66 

8.56 

8.51 

8.46 

8.41 

8.36 

8.31 

8.26 

.025 

14.55 

14.37 

14.20 

14.02 

13.93 

13.84 

13.75 

13.65 

13.56 

13.46 

.010 

20.97 

20.70 

20.44 

20.17 

20.03 

19.89 

19.75 

19.61 

19.47 

19.32 

.005 

3.30 

3.27 

3.24 

3.21 

3.19 

3.17 

3.16 

3.14 

3.12 

3.10 

.100 

5 

4.74 

4.68 

4.62 

4.56 

4.53 

4.50 

4.46 

4.43 

4.40 

4.36 

.050 

6.62 

6.52 

6.43 

6.33 

6.23 

6.23 

6.18 

6.12 

6.07 

6.02 

.025 

10.05 

9.89 

9.72 

9.55 

9.47 

9.38 

9.29 

9.20 

9.11 

9.02 

.010 

13.62 

13.38 

13.15 

12.90 

12.78 

12.66 

12.53 

12.40 

12227 

12.14 

.005 

2.94 

2.90 

2.87 

2.84 

2.82 

2.80 

2.78 

2.76 

2.74 

2.72 

.100 

6 

4.06 

4.00 

3.94 

3.87 

3.84 

3.81 

3.77 

3.74 

3.70 

3.67 

.050 

5.46 

5.37 

5.27 

3.17 

5.12 

5.07 

5.01 

4.96 

4.90 

4.85 

.025 

7.87 

7.72 

7.56 

7.40 

7.31 

7.23 

7.14 

7.06 

6.97 

6.88 

.010 

10.25 

10.03 

9.81 

9.59 

9.47 

9.36 

9.24 

9.12 

9.00 

8.88 

.005 

2.70 

2.67 

2.63 

2.59 

2.58 

2.56 

2.54 

2.51 

2.49 

2.47 

.100 

7 

3.64 

3.57 

3.51 

3.44 

3.41 

3.38 

3.34 

3.30 

3.27 

3.23 

.050 

4.76 

4.67 

4.57 

4.47 

4.42 

4.36 

4.31 

4.25 

4.20 

4.14 

.025 

6.62 

6.47 

6.31 

6.16 

6.07 

5.99 

5.91 

5.82 

5.74 

5.65 

.010 

8.38 

8.18 

7.97 

7.75 

7.65 

7.53 

7.42 

7.31 

7.19 

7.08 

.005 

2.54 

2.50 

2.46 

2.42 

2.40 

2.38 

2.36 

2.34 

2.32 

2.29 

.100 

8 

3.35 

3.28 

3.22 

3.15 

3.12 

3.08 

3.04 

3.01 

2.97 

2.93 

.050 

4.30 

4.20 

4.10 

4.00 

3.95 

3.89 

3.84 

3.78 

3.73 

3.67 

.025 

5.8 1 

5.67 

5.52 

5.36 

5.28 

5.20 

5.12 

5.03 

4.95 

4.86 

.010 

7.21 

7.01 

6.81 

6.61 

6.50 

6.40 

6.29 

6.18 

6.06 

5.95 

.005 

2.42 

2.38 

2.34 

2.30 

2.28 

2.25 

2.23 

2.21 

2.18 

2.16 

.100 

9 

3.14 

3.07 

3.01 

2.94 

2.90 

2.86 

2.83 

2.79 

2.75 

2.71 

.050 

3.96 

3.87 

3.77 

3.67 

3.61 

3.56 

3.51 

3.45 

3.39 

3.33 

.025 

5.26 

5.11 

4.96 

4.01 

4.73 

4.65 

4.57 

4.48 

4.40 

4.31 

.010 

6.42 

6.23 

6.03 

5.83 

5.73 

5.62 

5.52 

5.41 

5.30 

5.19 

.005 

2.32 

2.23 

2.24 

2.20 

2.18 

2.16 

2.13 

2.1! 

2.08 

2.06 

.100 

10 

2.98 

2.91 

2.85 

2.77 

2.74 

2.70 

2.66 

2.62 

2.58 

2.54 

.050 

3.72 

3.62 

3.52 

3.42 

3.37 

3.31 

3.26 

3.20 

3.14 

3.08 

.025 

4.85 

4.71 

4.56 

4.41 

4.33 

4.25 

4.17 

4.08 

4.00 

3.91 

.010 

3.85 

5.66 

5.47 

5.27 

5.17 

5.07 

4.97 

4.86 

4.75 

4.64 

.005 

2.25 

2.21 

2.17 

2.12 

2.10 

2.08 

2.05 

2.03 

2.00 

1.97 

.100 

It 

2.85 

2.79 

2.72 

2.65 

2.61 

2.57 

2.53 

2.49 

2.45 

2.40 

.050 

3.53 

3.43 

3.33 

3.23 

3.17 

3.12 

3.06 

3.00 

2.94 

2.88 

.025 

4.54 

4.40 

4.25 

4.10 

4.02 

3.94 

3.86 

3.78 

3.69 

3.60 

.010 

5.42 

5.24 

5.05 

4.86 

4.76 

4.65 

4.55 

4.44 

4.34 

4.23 

.005 

2.19 

2.15 

2.10 

2.06 

2.04 

2.01 

1.99 

1.96 

1.93 

1.90 

.100 

12 

2.75 

2.69 

2.62 

2.54 

2.51 

2.47 

2.43 

2.38 

2.34 

2.30 

.050 

3.37 

3.2U 

3.18 

3.07 

3.02 

2.96 

2.91 

2.85 

2.79 

2.72 

.025 

4.30 

4.16 

4.01 

3.86 

3. 78 

3.70 

3.62 

3.54 

3.45 

3.36 

.010 

5.09 

4.91 

4.72 

4.53 

4.43 

4.33 

4.23 

4.12 

4.01 

3.90 

.005 

2.14 

2.10 

2.05 

2.01 

1.98 

1.96 

1.93 

1.90 

1.88 

1.85 

.100 

13 

2.67 

2.60 

2.53 

2.46 

2.42 

2.38 

2.34 

2.30 

2.25 

2.21 

.050 

3.25 

3.15 

3.05 

2.95 

2.89 

2.84 

2.78 

2.72 

2.66 

2.60 

.025 

4.10 

3.96 

3.82 

3.66 

3.59 

3.51 

3.43 

3.34 

3.25 

3.17 

.010 

4.82 

4.64 

4.46 

4 27 

4.17 

4.07 

3.97 

3.87 

3.76 

3.65 

.005 

2.10 

2.05 

2.01 

1.96 

1.94 

1.91 

1.89 

1.86 

1.83 

1.80 

.100 

14 

2. CO 

2.53 

2.46 

2.39 

2.35 

2.31 

2.27 

2.22 

2.IH 

2.13 

.050 

3.15 

3.05 

2.95 

2.84 

2.79 

2.73 

2.67 

2.61 

2.55 

2.49 

.025 

3.94 

3.80 

3.66 

3.51 

3.43 

3.35 

3.27 

3.18 

3.09 

3.00 

.010 

4.60 

4.43 

4.25 

4.06 

3.96 

3.86 

3.76 

3.66 

3.55 

3.44 

.005 

Denomi- 

nator 

df 

Probability 
of  a larger 
F 

Numerator  df 

i 

2 

3 

< 

5 

6 

7 

8 

9 

15 

.100 

3.07 

2.70 

2.49 

2.36 

2.27 

2.21 

2.16 

2.12 

2.09 

.050 

4.54 

3.68 

3.29 

3.06 

2.90 

2.79 

2.71 

2.64 

2.59 

.025 

6.20 

4.77 

4.15 

3.80 

3.58 

3.41 

3.29 

3.20 

3.12 

.010 

8.68 

6.36 

5.42 

4.89 

4.56 

4.32 

4.14 

4.00 

3.89 

.005 

10.80 

7.70 

6.48 

5.80 

5.37 

5.07 

4.85 

4.67 

4.54 

16 

.100 

3.05 

2.67 

2.46 

2.33 

2.24 

2.18 

2.13 

2.09 

2.06 

.050 

4.49 

3.63 

3.24 

3.01 

2.85 

2.74 

2.66 

2.59 

2.54 

.025 

6.12 

4.69 

4.08 

3.73 

3.50 

3.34 

3.22 

3.12 

3.05 

.010 

8.53 

6.23 

5.29 

4.77 

4.44 

4.20 

4.03 

3.89 

3.78 

.005 

1D.58 

7.51 

6.30 

5.64 

5.21 

4.91 

4.69 

4_52 

4.38 

17 

.100 

3.03 

2.64 

2.44 

2.31 

2.22 

2.15 

2.10 

2.06 

2.03 

.050 

4.45 

3.59 

3.20 

2.96 

2.81 

2.70 

2.61 

2.55 

2.49 

.025 

6.04 

4.62 

4.01 

3.66 

3.44 

3.28 

3.16 

3.06 

2.98 

.010 

8.40 

6.11 

5.18 

4.67 

4.34 

4.10 

3.93 

3.79 

3.68 

.005 

10.38 

7.35 

6.16 

5.50 

5.07 

4.78 

4.56 

4.39 

4225 

10 

.100 

3.01 

2.62 

2.42 

2.29 

2.20 

2.13 

2.08 

2.04 

2.00 

.050 

4.41 

3.55 

3.16 

2.93 

2.77 

2.66 

2.58 

2.51 

2.46 

.025 

5.98 

4.56 

3.95 

3.61 

3.38 

3.22 

3.10 

3.01 

2.93 

.010 

8.29 

6.01 

5.09 

4.58 

4.25 

4.01 

3.84 

3.71 

3.60 

.005 

10.22 

7.21 

6.03 

5.37 

4.96 

4.66 

4.44 

4.28 

4.14 

19 

.100 

2.99 

2.61 

2.40 

2.27 

2.18 

2.11 

2.06 

2.02 

1.98 

.050 

4.38 

3.52 

3.13 

2.90 

2.74 

2.63 

2.54 

2.48 

2.42 

.025 

5.92 

4.51 

3.90 

3.56 

3.33 

3.17 

3.05 

2.96 

2.88 

.010 

8.18 

5.93 

5.01 

4.50 

4.17 

3.94 

3.77 

3.63 

3.52 

.005 

10.07 

7.09 

5.92 

5.27 

4.85 

4.56 

4.34 

4.18 

4.04 

20 

.100 

2.97 

2.59 

2238 

2.25 

2.16 

209 

2.04 

2.00 

1.96 

.050 

4.35 

3.49 

3.10 

2.87 

2.71 

2.60 

2.51 

2.45 

2.39 

.025 

5.0T 

4.46 

3.86 

3.51 

3.29 

3.13 

3.01 

2.91 

2.84 

.01ft 

8.10 

5.85 

4.94 

4.43 

4.10 

3.87 

3.70 

3.56 

3.46 

.005 

9.94 

6.99 

5.82 

5.17 

4.76 

4.47 

4.26 

4.09 

3.96 

21 

.100 

2.96 

2.57 

2.36 

2.23 

2.14 

2.08 

2.02 

1.98 

1.95 

.050 

4.32 

3.47 

3.07 

2.84 

2.68 

2.57 

2.49 

2.42 

2.37 

.025 

5.83 

4.42 

3.82 

3.48 

3.25 

3.09 

2.97 

2.87 

2.80 

.010 

8.02 

5.78 

4.87 

4.37 

4.04 

3.81 

3.64 

3.51 

3.40 

.005 

9.83 

6.89 

5.73 

5.09 

4.68 

4.39 

4.18 

4.01 

3.88 

22 

.100 

2.95 

2.56 

2.35 

2.22 

2.13 

2.06 

2.01 

1.97 

1.93 

.050 

4.30 

3.44 

3.05 

2.82 

2.66 

2.55 

2.46 

2.40 

2.34 

.025 

5.79 

4.38 

3.78 

3.44 

3.22 

3.05 

2.93 

2.84 

2.76 

.010 

7.95 

5.72 

4.62 

4.31 

3.99 

3.76 

3.59 

3.45 

3.35 

.005 

9.73 

6.81 

5.65 

5.02 

4.61 

4.32 

4.11 

3.94 

3.81 

23 

.100 

2.94 

2.55 

2.34 

2.21 

2.11 

2.05 

1.99 

1.95 

1.92 

.050 

4.28 

3.42 

3.03 

2.80 

2.64 

2.53 

2.44 

2.37 

2.32 

.025 

5.75 

4.35 

3.75 

3.41 

3.18 

3.02 

2.90 

2.81 

2.73 

.010 

7.88 

5.66 

4.76 

4.26 

3.94 

3.71 

3.54 

3.41 

3.30 

.005 

9.63 

6.73 

5.58 

4.95 

4.54 

4.26 

4.05 

3.88 

3.75 

24 

.100 

2.93 

2.54 

2.33 

2.19 

2.10 

2.04 

1.98 

1.94 

1.91 

.050 

4.26 

3.40 

3.01 

2.78 

2.62 

2.51 

2.42 

2.36 

2.30 

.025 

5.72 

4.32 

3.72 

3.38 

3.15 

2.99 

2.87 

2.78 

2.70 

.010 

7.82 

5.61 

4.72 

4.22 

3.90 

3.67 

3.50 

3.36 

3.26 

.005 

9.55 

6.66 

5.52 

4.89 

4.49 

4.20 

3.99 

3.83 

3.69 

23 

.100 

2.92 

2.53 

2.32 

2.18 

2.09 

2.02 

1.97 

1.93 

1.89 

.050 

4.24 

3.39 

2.99 

2.76 

2.60 

2.49 

2.40 

2.34 

2.28 

.025 

5.69 

4.29 

3.69 

3.35 

3.13 

2.97 

2.85 

2.75 

2.68 

.010 

7.77 

5.57 

4.68 

4.18 

3.85 

3.63 

3.46 

3.32 

3.22 

.005 

9.48 

6.60 

5.46 

4.84 

4.43 

4.15 

3.94 

3.78 

3.64 

26 

.100 

2.91 

2.52 

2.31 

2.17 

2.08 

2.01 

1.96 

1.92 

1.88 

.050 

4.23 

3.37 

2.98 

2.74 

2.59 

2.47 

2.39 

2.32 

<y  07 

.025 

5.66 

4.27 

3.67 

3.33 

3.10 

2.94 

2.02 

2.73 

2.65 

.010 

7.72 

5.53 

4.64 

4.14 

3.82 

3.59 

3.42 

3.29 

3.18 

.005 

9.41 

6.54 

5.41 

4.79 

4.38 

4.10 

3.89 

3.73 

3.60 

27 

.100 

2.90 

2.51 

2.30 

2.17 

2.07 

2.00 

1.95 

1.91 

1.87 

.050 

4.21 

3.35 

2.96 

2.73 

2.57 

2.46 

2.37 

2.31 

2.25 

.025 

5.63 

4.24 

3.65 

3.31 

3.08 

2.92 

2.00 

2.71 

2.63 

.010 

7.68 

5.49 

4.60 

4.11 

3.78 

3.56 

3.39 

3.26 

3.15 

.005 

9.34 

6.49 

5.36 

4.74 

4.34 

4.06 

3.85 

3.69 

3.56 

28 

.100 

2.89 

2.50 

2.20 

2.16 

2.06 

^2.00 

1.94 

1.90 

1.87 

.050 

4.20 

3.34 

2.95 

2.71 

2.56 

’2.45*- 

2.36 

2.29 

2.24 

.025 

5.61 

4.22 

3.63 

3.29 

3.06 

2.90 

2.78 

2.69 

2.61 

.010 

7.64 

5.45 

4.57 

4.07 

3.75 

3.53 

3.36 

3.23 

3.12 

.005 

9.28 

6.44 

5.32 

4.70 

4.30  1 

4.02 

J.81 

3.65 

3.52 

Numerator  if 


10 

12 

1 15 

20 

1 24 

30 

40 

1 60 

120 

1 50 

1 P 

1 « 

2.06 

2.02 

1.97 

1.92 

1.90 

1.87 

1.85 

1.82 

1.79 

1.76 

.100 

15 

2.54 

2.48 

2.40 

2.33 

2.29 

2.25 

2.20 

2.16 

2.11 

2.07 

.050 

3.06 

2.96 

2.86 

2.76 

2.70 

2.64 

2.59 

242 

2.46 

2.40 

.025 

3.80 

3.67 

3.52 

3.37 

3229 

3.21 

3.13 

3.05 

2.96 

2.87 

.010 

4.42 

4.25 

4.07 

3.88 

3.79 

3.69 

348 

3.48 

347 

346 

.005 

2.03 

1.99 

1.94 

1.89 

1.87 

1.84 

1.81 

1.78 

1.75 

1.72 

.100 

16 

2.49 

2.42 

2.35 

2.28 

2.24 

2.19 

2.15 

2.11 

2.06 

2.01 

.050 

2.99 

2.89 

2.79 

2.68 

2.63 

2.57 

241 

2.45 

2.38 

242 

.025 

3.69 

3.55 

3.41 

3.26 

3.18 

3.10 

3.02 

2.93 

2.84 

2.75 

.010 

4.27 

4.10 

3.92 

3.73 

3.64 

3.54 

3.44 

3.33 

3.22 

3.11 

.005 

2.00 

1.96 

1.91 

1.86 

1.84 

1.81 

1.78 

1.75 

1.72 

1.69 

.100 

17 

2.45 

2.38 

2.31 

2.23 

2.19 

2.15 

2.10 

2.06 

2.01 

1.96 

.050 

2.92 

2.82 

2.72 

2.62 

2.56 

2.50 

2.44 

2.38 

242 

2.25 

.025 

3.59 

3.46 

3.31 

3.16 

3.08 

3.00 

2.92 

2.83 

2.75 

2.65 

.010 

4.14 

3.97 

3.79 

3.61 

341 

3.41 

341 

341 

3.10 

2.98 

.005 

1.98 

1.93 

1.89 

1.84 

1.81 

1.78 

1.75 

1.72 

1.69 

1.66 

.100 

18 

2.41 

2.34 

2.27 

2.19 

2.15 

2.11 

2.06 

2.02 

1.97 

1.92 

.050 

2.87 

2.77 

2.67 

2.56 

2.50 

2.44 

2.38 

2.32 

2.26 

2.19 

.025 

3.51 

3.37 

3.23 

3.08 

3.00 

2.92 

2.84 

2.75 

2.66 

247 

.010 

4.03 

3.86 

3.68 

3.50 

3.40 

3.30 

3.20 

3.10 

2.99 

2.87 

.005 

1.96 

1.91 

1.86 

1.81 

1.79 

1.76 

1.73 

1.70 

1.67 

1.63 

.100 

19 

2.38 

2.3! 

2.23 

2.16 

2.11 

2.07 

2.03 

1.98 

1.93 

1.88 

.050 

2.82 

2.72 

2.62 

2.51 

2.45 

2.39 

2.33 

2.27 

2.20 

2.13 

.025 

3.43 

3.30 

3.15 

3.00 

2.92 

2.84 

2.76 

2.67 

2.58 

2.49 

.010 

3.93 

3.76 

3.59 

3.40 

3.31 

3.21 

3.11 

3.00 

2.89 

2.78 

.005 

1.94 

1.89 

1.84 

1.79 

1.77 

1.74 

1.71 

1.68 

1.64 

1.61 

.100 

20 

2.35 

2.28 

2.20 

2.12 

2.08 

2.04 

1.99 

1.95 

1.90 

1.84 

.050 

2.77 

2.68 

2.57 

2.46 

2.41 

2.35 

2.29 

2.22 

2.16 

2.09 

.025 

3.37 

3.23 

3.09 

2.94 

2.86 

2.78 

2.69 

2.61 

2.52 

2.42 

010 

3.85 

3.68 

3.50 

3.32 

3.22 

3.12 

3.02 

242 

2.81 

2.69 

.005 

1.92 

1.87 

1.83 

1.78 

1.75 

1.72 

1.69 

1.66 

1.62 

149 

.100 

21 

2.32 

2.25 

2.18 

2.10 

2.05 

2.01 

1.96 

1.92 

1.87 

1.81 

.050 

2.73 

2.64 

2.53 

2.42 

2.37 

2.3! 

2.25 

2.18 

2.11 

2.04 

.025 

3.31 

3.17 

3.03 

2.88 

2.80 

2.72 

2.64 

2.55 

2.46 

246 

.010 

3.77 

3.60 

3.43 

3.24 

3.15 

3.05 

2.95 

2.84 

2.73 

2.61 

.005 

1.90 

1.86 

1.81 

1.76 

1.73 

1.70 

1.67 

1.64 

1.60 

147 

.100 

22 

2.30 

2.23 

2.15 

2.07 

2.03 

1.98 

1.94 

1.89 

1.84 

1.78 

.050 

2.70 

2.60 

2.50 

2.39 

2.33 

2.27 

2.21 

2.14 

2.08 

2.00 

.025 

3.26 

3.12 

2.98 

2.83 

2.75 

2.67 

2.58 

2.50 

2.40 

241 

.010 

3.70 

3.54 

3.36 

3.18 

3.08 

2.98 

2.88 

2.77 

2.66 

245 

.005 

1.89 

1.84 

1.80 

1.74 

1.72 

1.69 

1.66 

1.62 

1.59 

145 

.100 

23 

2.27 

2.20 

2.13 

2.05 

2.01 

1.96 

1.91 

1.86 

1.0! 

1.76 

.050 

2.67 

2.57 

2.47 

2.36 

2.30 

2.24 

2.18 

2.11 

2.04 

1.97 

.025 

3.21 

3.07 

2.93 

2.78 

2.70 

2.62 

2.54 

2.45 

245 

2.26 

.010 

3.64 

3.47 

3.30 

3.12 

3.02 

2.92 

2.82 

2.71 

2.60 

2.48 

.005 

1.88 

1.83 

1.78 

1.73 

1.70 

1.67 

1.64 

1.61 

1.57 

1.53 

.100 

24' 

2.25 

2.18 

2.11 

2.03 

1.98 

1.94 

1.89 

1.84 

1.79 

1.73 

.050 

2.64 

2.54 

2.44 

2.33 

2.27 

2.21 

2.15 

2.08 

2.01 

1.94 

.025 

3.17 

3.03 

2.89 

2.74 

2.66 

2.58 

2.49 

2.40 

241 

2.21 

.010 

3.59 

3.42 

3.25 

3.06 

2.97 

2.87 

2.77 

2.66 

2.55 

2.43 

.005 

1.87 

1 .82 

1.77 

1.72 

1.69 

1.66 

1.63 

1.59 

146 

1.52 

.100 

25 

2.24 

2.16 

2.09 

2.01 

1.96 

1.92 

1.87 

1.82 

1.77 

1.71 

.050 

2.61 

2.51 

2.41 

2.30 

2.24 

2.18 

2.12 

2.05 

1.98 

1.91 

.025 

3.13 

2.99 

2.85 

2.70 

2.62 

2.54 

2.45 

2.36 

2.27 

2.17 

.010 

3.54 

3.37 

3.20 

3.01 

2.92 

2.82 

2.72 

2.61 

2.50 

2.38 

.005 

1.H6 

1.81 

1.76 

1.71 

1.68 

1.65 

1.61 

1.58 

144 

1.50 

.100 

26 

2 2^ 

2.15 

2.07 

1.99 

1.95 

1.90 

1.05 

1.80 

1.75 

1.69 

.050 

2.59 

2.49 

2.39 

2.28 

2.22 

2.16 

2.09 

2.03 

1.95 

1.88 

.025 

3.09 

2.96 

2.81 

2.66 

2.58 

2.50 

2.42 

2.33 

2.23 

2.13 

.010 

3.49 

3.33 

3.15 

2.97 

2.87 

2.77 

2.67 

2.56 

2.45 

2.33 

.005 

1.85 

1.80 

1.75 

1.70 

1.67 

1.64 

1.60 

147 

143 

1.49 

.100 

27 

2.20 

2.13 

2.06 

1.97 

1.93 

1.88 

1.84 

1.79 

1.73 

1.67 

.050 

2.57 

2.47 

2.36 

2.25 

2.19 

2.13 

2.07 

2.00 

1.93 

1 .85 

.025 

3.06 

2.93 

2.78 

2.63 

2.55 

2.47 

2.38 

2.29 

2.20 

2.10 

.010 

3.45 

3.28 

3.11 

2.93 

2.83 

2.73 

2.63 

2.52 

2.41 

2.29 

.005 

1.84 

1.79 

1.74 

1.69 

1.66 

1.63 

149 

1.56 

1.52 

1.48 

.100 

28 

2.19 

2.12 

2.04 

1.96 

1.91 

1.87 

1.82 

1.77 

1.71 

1.65 

.050 

2.55 

2.45 

2.34 

2.23 

2.17 

2.1! 

2.05 

1.98 

1.91 

1.83 

.025 

3.03 

2.90 

2.75 

2.60 

0 59 

2.44 

2.35 

2.26 

2.17 

2.06 

.010 

3.41 

3.25 

3.07 

2.89 

2.79 

2.69 

249 

2.48 

247 

2.25 

.005 

Dcnomi- 


Probabilitv  | 


Numerator  df 


nator 

df 

oi  a larger 

1 | 2 

3 

4 

5 

6 

7 

8 

9 

20 

.100 

2.89 

2.50 

2.28 

2.15 

2.06 

1.99 

1.93 

1.89 

1.86 

.050 

4.18 

3.33 

2.93 

2.70 

2.55 

2.43 

2.35 

2.28 

2.22 

.025 

5.59 
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3.34 
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1.68 
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3.92 

3.07 

2.68 

2.45 

2.29 

2.17 

2.09 

2.02 

1.96 

.025 

5.15 

3.80 

3.23 

2.89 

2.67 

2.52 

2.39 

2.30 

2.22 

.010 

6.85 

4.79 

3.95 

3.48 

3.17- 

2.96 

2.79 

2.66 

2.56 

.005 

8.18 

5.54 

4.50 

3.92 

3.55 

3.28 

3.09 

2.93 

2.81 

00 

^100. 

2.71 

2.30 

2.08 

1.94 

1.85 

1.77 

1.72 

1.67 

1.63 

C050. 

3.84 

3.00 

2.60 

2.37 

2.21 

2.10 

2.01 

1.94 

1.88 

.025 

5.02 

3.69 

3.12 

2.79 

2.57 

2.41 

2.29 

2.19 

2.11 

.010 

6.63 

4.61 

3.78 

3.32 

3.02 

2.80 

2.64 

2.51 

2.41 

.005 

7.88 

5.30 

4.28 

3.72 

3.35 

3.09 

2.90 

2.74 

2.62 

Source:  A portion  of  “Tables  of  percentage  points  of  the  inverted  beta  (F)  distribution,**  Biometrika , vol.  33  (1943)  by  M. 
Merrineton  and  C.  M.  Thompson  and  from  Table  18  of  Biometrika  Tables  for  Statisticians , vol.  I,  Cambridge  University  Press, 
1954,  edited  by  C.  S.  Pearson  and  H.  O.  Hartley.  Reproduced  with  permission  of  the  authors,  editors,  and  Biometnka  trustees. 
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Table  A-5.  Upper  5%  points,  Q (Nash,  1965). 


Number  of  treatments,  a 


Degrees  of 


freedom 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

18.0 

26.98 

32.82 

37.08 

40.41 

43.12 

45.40 

47.36 

49.07 

2 

6.08 

8.33 

9.80 

10.88 

11.74 

12.44 

13.03 

13.54 

13.99 

3 

4.50 

5.91 

6.82 

7.50 

8.04 

8 48 

8.85 

9.18 

9.46 

4 

3.93 

5.04 

5.76 

6.29 

6.71 

7.05 

7.35 

7.60 

7.83 

5 

3.64 

4.60 

5.22 

5.67 

6,03 

6.33 

6.58 

6.80 

6.99 

6 

3.46 

4.34 

4.90 

5.30 

5.63 

5.90 

6.12 

6.32 

6.49 

7 

3.34 

4.16 

4.68 

5.06 

5.36 

5.61 

5.82 

6.00 

6.16 

8 

3.26 

4.04 

4.53 

4.89 

5.17 

5.40 

5.60 

5.77 

5.92 

9 

3.20 

3.95 

4.44 

4.76 

5.02 

5.24 

5.43 

5.59 

5.74 

10 

3.15 

3.88 

4.33 

4.65 

4.91 

5.12 

5.30 

5.46 

5.60 

11 

3.11 

3.82 

4.26 

4.57 

4.82 

5.03 

5.20 

5.35 

5.49 

12 

3.08 

3.77 

4.20 

4.51 

4.75 

4.95 

5.12 

5.27 

5.39 

13 

3.06 

3.73 

4.15 

4.45 

4.69 

4.88 

5.05 

5.19 

5.32 

14 
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4.11 

4.41 

4.64 

4.83 
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5.13 

5.25 

15 

3.01 

3.67 

4.08 

4.37 
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4.94 

5.08 

5.20 

16 

3.00 

3.65 

4.05 

4.33 

4.56 

4.74 

4.90 

5.03 

5.15 

17 

2.98 

3.63 

4.02 

4.30 

4.52 

4.70 

4.86 

4.99 

5.11 

18 

2.97 

3.61 

4.00 

4.28 

4.49 

4.67 

4.82 

4.96 

5.07 

19 

2.96 

3.59 

3.98 

4.25 

4.47 

4.65 

4.79 

4.92 

5.04 

20 

2.95 

3.58 

3.96 

4.23 

4.45 

4.62 

4.77 

4.90 

5.01 

30 

2.89 

3.49 

3.85 

4.10 

4.30 

4.46 

4.60 

4.72 

4.82 
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@© 

2.77 

3.31 

3.63 

3.86 

4.03 

4.17 

4.29 

4.39 

4.47 

Table  9.3  is  taken  from  J.  Parchares  "Table  of  the  Upper  10#  Points  of  the  Studentized  Range",  Blometrika 
46:  461-466  (1959)  with  additional  table  by  Dr.  Leon  Harter  by  permission  of  Professor  E.  S.  Pearson, 
Editor  of  Blometrika. 
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